Interactions in Logistic Regression

Similar documents
Logistic Regressions. Stat 430

On the Inference of the Logistic Regression Model

Log-linear Models for Contingency Tables

R Hints for Chapter 10

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Poisson Regression. The Training Data

Let s see if we can predict whether a student returns or does not return to St. Ambrose for their second year.

Logistic Regression - problem 6.14

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Exercise 5.4 Solution

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Logistic Regression 21/05

Random Independent Variables

STA102 Class Notes Chapter Logistic Regression

Week 7 Multiple factors. Ch , Some miscellaneous parts

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Matched Pair Data. Stat 557 Heike Hofmann

Modeling Overdispersion

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

9 Generalized Linear Models

Regression Methods for Survey Data

R Output for Linear Models using functions lm(), gls() & glm()

ssh tap sas913, sas

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Generalised linear models. Response variable can take a number of different formats

Linear Regression Models P8111

Reaction Days

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Unit 5 Logistic Regression Practice Problems

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

BMI 541/699 Lecture 22

Sample solutions. Stat 8051 Homework 8

STA 450/4000 S: January

Introduction to the Analysis of Tabular Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Generalized linear models

UNIVERSITY OF TORONTO Faculty of Arts and Science

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Analysis of binary repeated measures data with R

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Consider fitting a model using ordinary least squares (OLS) regression:

12 Modelling Binomial Response Data

STAC51: Categorical data Analysis

Exam Applied Statistical Regression. Good Luck!

P( y = 1) = 1+ e b 0+b 1 x 1 1 ( ) Activity #13: Generalized Linear Models: Logistic, Poisson Regression

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Using R in 200D Luke Sonnet

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Age 55 (x = 1) Age < 55 (x = 0)

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Various Issues in Fitting Contingency Tables

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Regression and Generalized Linear Models. Dr. Wolfgang Rolke Expo in Statistics C3TEC, Caguas, October 9, 2015

Extending the Discrete-Time Model

Introducing Generalized Linear Models: Logistic Regression

How to work correctly statistically about sex ratio

Proportional Odds Logistic Regression. stat 557 Heike Hofmann

Generalized Linear Models. stat 557 Heike Hofmann

Leftovers. Morris. University Farm. University Farm. Morris. yield

Non-Gaussian Response Variables

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Simple logistic regression

( ) ( ) 2. ( ) = e b 0 +b 1 x. logistic function: P( y = 1) = eb 0 +b 1 x. 1+ e b 0 +b 1 x. Linear model: cancer = b 0. + b 1 ( cigarettes) b 0 +b 1 x

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

PAPER 206 APPLIED STATISTICS

Lecture 12: Effect modification, and confounding in logistic regression

Introduction to General and Generalized Linear Models

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

Generalized linear models

Checking the Poisson assumption in the Poisson generalized linear model

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2

Model Estimation Example

Generalized Linear Models

Classification. Chapter Introduction. 6.2 The Bayes classifier

Ch 6: Multicategory Logit Models

Introduction to Statistics and R

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

Stat 8053, Fall 2013: Poisson Regression (Faraway, 2.1 & 3)

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Stat 401B Final Exam Fall 2016

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Cherry.R. > cherry d h v <portion omitted> > # Step 1.

Transcription:

Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/ code_n_data/berkeley2.data") > Berkeley Gender Dept Yes No 1 Male A 512 313 2 Female A 89 19 3 Male B 353 207 4 Female B 17 8 5 Male C 120 205 6 Female C 202 391 7 Male D 138 279 8 Female D 131 244 9 Male E 53 138 10 Female E 94 299 11 Male F 22 351 12 Female F 24 317 > # Resp var is 2 cols. Second col is Y=1 > full = glm(cbind(no,yes) ~ Dept*Gender,family=binomial,data=Berkeley) > anova(full,test='chisq') Analysis of Deviance Table Model: binomial, link: logit Response: cbind(no, Yes) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 11 877.06 Dept 5 855.32 6 21.74 < 2.2e-16 *** Gender 1 1.53 5 20.20 0.215928 Dept:Gender 5 20.20 0 0.00 0.001144 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Page 1 of 10

> # Let's see what it means. Repeating some material from an earlier analysis... > noquote(gradschool) Dept %MaleAcc %FemAcc Chisq p-value [1,] A 62.1 82.4 17.25 3e-05 [2,] B 63 68 0.25 0.61447 [3,] C 36.9 34.1 0.75 0.38536 [4,] D 33.1 34.9 0.3 0.58515 [5,] E 27.7 23.9 1 0.31705 [6,] F 5.9 7 0.38 0.53542 > Male = as.numeric(gradschool[,2]) > Female = as.numeric(gradschool[,3]) > cbind(male,female) Male Female [1,] 62.1 82.4 [2,] 63.0 68.0 [3,] 36.9 34.1 [4,] 33.1 34.9 [5,] 27.7 23.9 [6,] 5.9 7.0 > # On the log scale, differences are logs of odds ratios. > # Non-parallel means the odds ratio DEPENDS > logmale = log(male); logfemale = log(female) > plot(rep(1:6,2),c(logmale,logfemale), pch=" ", axes=f, + xlab="department",ylab="log Percent Acceptance") > axis(1,1:6,letters[1:6]) # X axis > axis(2) # Y axis > lines(1:6,logfemale,lty=1); lines(1:6,logmale,lty=2) > points(1:6,logmale); points(1:6,logfemale,pch=19) > legend(2,2.5,legend="female Applicants",lty=1,bty="n",pch=19) > legend(2,2.3,legend="male Applicants",lty=2,bty="n",pch=1) > title("berkeley Graduate Admissions by Department") Page 2 of 10

Page 3 of 10

> summary(full) Call: glm(formula = cbind(no, Yes) ~ Dept * Gender, family = binomial, data = Berkeley) Deviance Residuals: [1] 0 0 0 0 0 0 0 0 0 0 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.5442 0.2527-6.110 9.94e-10 *** DeptB 0.7904 0.4977 1.588 0.11224 DeptC 2.2046 0.2672 8.252 < 2e-16 *** DeptD 2.1662 0.2750 7.878 3.32e-15 *** DeptE 2.7013 0.2790 9.682 < 2e-16 *** DeptF 4.1250 0.3297 12.512 < 2e-16 *** GenderMale 1.0521 0.2627 4.005 6.21e-05 *** DeptB:GenderMale -0.8321 0.5104-1.630 0.10306 DeptC:GenderMale -1.1770 0.2996-3.929 8.53e-05 *** DeptD:GenderMale -0.9701 0.3026-3.206 0.00135 ** DeptE:GenderMale -1.2523 0.3303-3.791 0.00015 *** DeptF:GenderMale -0.8632 0.4027-2.144 0.03206 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8.7706e+02 on 11 degrees of freedom Residual deviance: -8.8818e-15 on 0 degrees of freedom AIC: 92.94 Number of Fisher Scoring iterations: 3 Page 4 of 10

Categorical by Quantitative Interactions Parallel regression lines on the log scale mean that Log differences between groups are the same for each level of x. Odds ratios are the same for each level of x. Odds are in the same proportion at each level of x. Called a proportional odds model. Product terms represent departure from parallel lines. Translates to departure from proportional odds. To test proportional odds assumption, test regression coefficients of the product terms. Odds ratios depend on the value of x. Page 5 of 10

> math = read.table("http://www.utstat.toronto.edu/~brunner/312f12 /code_n_data/mathcat.data") > math[1:5,] hsgpa hsengl hscalc course passed outcome 1 78.0 80 Yes Mainstrm No Failed 2 66.0 75 Yes Mainstrm Yes Passed 3 80.2 70 Yes Mainstrm Yes Passed 4 81.7 67 Yes Mainstrm Yes Passed 5 86.8 80 Yes Mainstrm Yes Passed > attach(math) # Variable names are now available > > # Make dummy vars for course to be sure what's going on > n=length(hsgpa) > c1 = c2 = numeric(n) > c1[course=='catch-up'] = 1 > c2[course=='elite'] = 1 > # table(c1,course); table(c2,course) > c1gpa = c1*hsgpa; c2gpa = c2*hsgpa > > # Reduced model will have no interactions > redmod = glm(passed ~ hsgpa+c1+c2, family=binomial) > fullmod = glm(passed ~ hsgpa+c1+c2+c1gpa+c2gpa, family=binomial) > anova(redmod,fullmod,test='chisq') Analysis of Deviance Table Model 1: passed ~ hsgpa + c1 + c2 Model 2: passed ~ hsgpa + c1 + c2 + c1gpa + c2gpa Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 390 428.90 2 388 428.45 2 0.44679 0.7998 > > # Can do it with factors > contrasts(course) = contr.treatment(3,base=3) > red = glm(passed ~ hsgpa+course, family=binomial) > full = glm(passed ~ hsgpa+course+hsgpa:course, family=binomial) Page 6 of 10

> anova(red,full,test='chisq') Analysis of Deviance Table Model 1: passed ~ hsgpa + course Model 2: passed ~ hsgpa + course + hsgpa:course Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 390 428.90 2 388 428.45 2 0.44679 0.7998 > anova(redmod,fullmod,test='chisq') # For comparison Analysis of Deviance Table Model 1: passed ~ hsgpa + c1 + c2 Model 2: passed ~ hsgpa + c1 + c2 + c1gpa + c2gpa Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 390 428.90 2 388 428.45 2 0.44679 0.7998 Consistent with proportional odds. Page 7 of 10

> summary(fullmod) Call: glm(formula = passed ~ hsgpa + c1 + c2 + c1gpa + c2gpa, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.4720-0.9662 0.4454 0.8957 2.1617 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -14.28923 2.15367-6.635 3.25e-11 *** hsgpa 0.18658 0.02737 6.817 9.30e-12 *** c1-4.08308 9.15612-0.446 0.656 c2-4.94207 10.31611-0.479 0.632 c1gpa 0.03600 0.11773 0.306 0.760 c2gpa 0.07668 0.13492 0.568 0.570 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 530.66 on 393 degrees of freedom Residual deviance: 428.45 on 388 degrees of freedom AIC: 440.45 Number of Fisher Scoring iterations: 5 > summary(full) Call: glm(formula = passed ~ hsgpa + course + hsgpa:course, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.4720-0.9662 0.4454 0.8957 2.1617 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -14.28923 2.15367-6.635 3.25e-11 *** hsgpa 0.18658 0.02737 6.817 9.30e-12 *** course1-4.08308 9.15612-0.446 0.656 course2-4.94207 10.31611-0.479 0.632 hsgpa:course1 0.03600 0.11773 0.306 0.760 hsgpa:course2 0.07668 0.13492 0.568 0.570 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 530.66 on 393 degrees of freedom Residual deviance: 428.45 on 388 degrees of freedom AIC: 440.45 Number of Fisher Scoring iterations: 5 Page 8 of 10

> betahat = redmod$coefficients; betahat (Intercept) hsgpa c1 c2-14.7375649 0.1922924-1.2848883 0.9338170 > > gpa = 50:100 > catchup = betahat[1]+betahat[3] + betahat[2]*gpa > elite = betahat[1]+betahat[4] + betahat[2]*gpa > mainstream = betahat[1] + betahat[2]*gpa > > GPA = rep(gpa,3); Pass = c(catchup,elite,mainstream) > plot(gpa,pass,pch=' ') > lines(gpa,catchup,lty=1) > lines(gpa,elite,lty=2) > lines(gpa,mainstream,lty=3) > title("parallel Estimated Log Odds") Page 9 of 10

> oddscu = exp(catchup); oddsel = exp(elite) > oddsmain = exp(mainstream) > Odds = c(oddscu,oddsel,oddsmain) > plot(gpa,odds,pch=' ') > lines(gpa,oddscu,lty=1) > lines(gpa,oddsel,lty=2) > lines(gpa,oddsmain,lty=3) > title("proportional Estimated Odds") Page 10 of 10