Generalized logit models for nominal multinomial responses. Local odds ratios

Similar documents
8 Nominal and Ordinal Logistic Regression

Introducing Generalized Linear Models: Logistic Regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Discrete Multivariate Statistics

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Single-level Models for Binary Responses

Multinomial Logistic Regression Models

High-Throughput Sequencing Course

Categorical data analysis Chapter 5

STA102 Class Notes Chapter Logistic Regression

STAT 526 Spring Final Exam. Thursday May 5, 2011

Linear Regression With Special Variables

Classification. Chapter Introduction. 6.2 The Bayes classifier

Introduction To Logistic Regression

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Consider Table 1 (Note connection to start-stop process).

Statistics 3858 : Contingency Tables

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Survival Analysis Math 434 Fall 2011

Lecture 2: Poisson and logistic regression

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Ch 6: Multicategory Logit Models

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Generalized Linear Models and Exponential Families

STAT 7030: Categorical Data Analysis

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Lecture 5: Poisson and logistic regression

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Multiple Regression: Chapter 13. July 24, 2015

Statistics in medicine

Generalized Linear Models (GLZ)

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Simple logistic regression

Rapid Introduction to Machine Learning/ Deep Learning

Linear Regression Models P8111

Log-linear Models for Contingency Tables

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

STAT5044: Regression and Anova

Mass Fare Adjustment Applied Big Data. Mark Langmead. Director Compass Operations, TransLink Vancouver, British Columbia

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Logistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Proportional Odds Logistic Regression. stat 557 Heike Hofmann

Generalized linear models with a coarsened covariate

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized Linear Models 1

Investigating Models with Two or Three Categories

Calculating Odds Ratios from Probabillities

Testing Independence

Semiparametric Generalized Linear Models

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Chapter 20: Logistic regression for binary response variables

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Interpretation of the Fitted Logistic Regression Model

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models for Non-Normal Data

Logistic regression: Miscellaneous topics

Generalized Linear Models

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Logistic Regression for Ordinal Responses

Review of the General Linear Model

9 Generalized Linear Models

Generalized Linear Models

11. Generalized Linear Models: An Introduction

Relative Risk Calculations in R

Neural networks (not in book)

Cox s proportional hazards/regression model - model assessment

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Logistic Regression. Seungjin Choi

Stat 642, Lecture notes for 04/12/05 96

Introduction to mtm: An R Package for Marginalized Transition Models

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

Measures of Association and Variance Estimation

Cohen s s Kappa and Log-linear Models

Part 8: GLMs and Hierarchical LMs and GLMs

Data Mining 2018 Logistic Regression Text Classification

Multivariate spatial modeling

The Logit Model: Estimation, Testing and Interpretation

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

4 Multicategory Logistic Regression

Models for Ordinal Response Data

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Generalized Linear Models Introduction

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University

Transcription:

Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π 33 π 34 π 3+ Odds of Y = 4versusY = 2 when X = 1is (π 14 /π 1+ ) / (π 12 /π 1+ )=π 14 /π 12 Odds of Y = 4versusY = 2 when X = 3is (π 34 /π 3+ ) / (π 32 /π 3+ )=π 34 /π 32 Local odds ratio = (π 14 /π 12 ) / (π 34 /π 32 )=(π 14 π 34 ) / (π 12 π 32 ) Interpretation: If local OR = 2, There is a two-fold increase in the odds of a response, Y, in class 4 versus class 2 when comparing X = 1toX = 3. 2/17

Multinomial regression models for nominal response Let Y be a categorical response variable with J categories (J > 2) We desire a model for multinomial responses similar to a logistic regression model Y could be the location of a colorectal tumor (proximal, distal or rectal) X could be the covariate classes defined by a subject s race (AA or non-aa) and gender (male or female) Let π j (x) =P(Y = j x) for some fixed setting of the x explanatory variables, with j π j(x) =1 At this fixed setting of x we treat the counts at the J categories of Y as multinomial with probabilities {π 1 (x),...,π J (x)}. 3/17 Baseline-category logits We select one of the J categories of Y as the baseline (or reference) category Without loss of generality, order the categories of Y so the Jth level coincides with this baseline category Define the generalized logit (relative to the baseline category) as [ ] g j (x) =log = α j + β jx, j = 1,...,J 1 π J (x) This model defines J 1 sets of model parameters, one for each of the J 1 generalized logits. Therefore, for each logit we have A separate intercept (α j ) A separate set of regression parameters (β j ) 4/17

Multinomial likelihood Consider subject i s contribution to the log-likelihood J L i = log π z ij ij π ij = P(Y i = j) z ij = 1ifY i = j and z ij = 0ifY i j z i =(z i1,...,z ij ) is a vector of a single 1 and the rest 0 L i = = = J z ij log π ij = z ij log π ij + z ij log π ij z ij log π ij + 1 z ij log π ij z ij log π ij π ij + log π ij 5/17 Multinomial likelihood (cont.) Conclusions: 1. The multinomial distribution is a member of the multivariate exponential dispersion family 2. The baseline-category logits are the natural parameters for the multinomial distribution 3. The baseline-category logit functions are the canonical link functions for the multinomial GLM 6/17

Inverting generalized logits to obtain probabilities Recall that for covariate pattern x, we define the generalized logit as [ ] g j (x) =log = α j + β jx, j = 1,...,J 1. π J (x) These can be solved for the individual π j (x) yielding (i) π j (x) = exp{g j (x)} P(Y = j x) = 1 + exp{g j(x)}, j = 1,...,J 1 (ii) π J (x) = 1 P(Y = J x) = 1 + exp{g j(x)} 7/17 Example Let Y be a three-level categorical variable with the third level identified as the reference category. g 1 (x) =α 1 + β 1 x g 2 (x) =α 2 + β 2 x There is no g 3 (x) - the third level of Y is the reference category. π 1 (x) = P(Y = 1 x) = exp{α 1 + β 1 x} 1 + exp{α 1 + β 1 x} + exp{α 2 + β 2 x} π 2 (x) = exp{α P(Y = 2 x) = 2 + β 2 x} 1 + exp{α 1 + β 1 x} + exp{α 2 + β 2 x} π 3 (x) = 1 P(Y = 3 x) = 1 + exp{α 1 + β 1 x} + exp{α 2 + β 2 x} 8/17

Consider Deriving probabilities log ( ) = α j + β j π J (x) x Exponentiating both sides, we get π j (x) ( ) π J (x) = exp α j + β j x which is the (local) odds for category j versus category J. Multiplying both sides by π J (x), we obtain ) π j (x) =π J (x) exp (α j + β j x (1), Now, sum both sides over j = 1,..., J 1, ) π j (x) =π J (x) exp (α j + β j x (2), 9/17 Note that Deriving probabilities (cont.) π J (x)+ π j (x) = J π j (x) =1 It follows that π j (x) =1 π J (x) (3) Based on (3), we can substitute 1 π J (x) for π j(x) in (2) on Slide 9, resulting in ) 1 π J (x) =π J (x) exp (α j + β j x (4) 10 / 17

Deriving probabilities (cont.) Rearranging terms in (4) on Slide 10, we have [ 1 = π J (x) 1 + ( )] exp α j + β j x Solving for π J (x), wehave π J (x) = Recalling (1) from Slide 9 1 1 + ( ) exp α j + β j x π j (x) =π J (x) exp (α j + β j x ), we substitute our expression for π J (x) into (1) to obtain ) exp (α j + β j x π j (x) = 1 + ( ) exp α j + β j x 11 / 17 Obtaining other odds ratios Note that the J 1 baseline-category logits uniquely determine all remaining logits comparing any two response levels Consider the logit for category j versus j where j j and j J log ( ) π j (x) ( ) = log log π J (x) ( ) πj (x) π J (x) = g j (x) g j (x) = (α j + β j x) (α j + β j x) 12 / 17

CRC tumor location example Y i = tumor location for ith colorectal cancer (CRC) patient (1 = proximal, 2 = distal, 3 = rectal) X 1i = race for ith CRC patient (1 = AA or 0 = non-aa) X 2i = gender for ith CRC patient (1 = male or 0 = female) Using rectal tumor location as the reference category, we fit the following generalized logits: g 1 (x i ) = α 1 + β 11 x 1i + β 12 x 2i g 2 (x i ) = α 2 + β 21 x 1i + β 22 x 2i For the jth logit (j = 1, 2) α j is the intercept β j1 is the effect of subject s race β j2 is the effect of subject s gender 13 / 17 CRC example: log odds proximal Covariate classes distal proximal rectal rectal distal AA Male α 1 + β 11 + α 2 + β 21 + (α 1 + β 11 + β 12 ) β 12 β 22 (α 2 + β 21 + β 22 ) Female α 1 + β 11 α 2 + β 21 (α 1 + β 11 ) (α 2 + β 21 ) non-aa Male α 1 + β 12 α 2 + β 22 (α 1 + β 12 ) (α 2 + β 22 ) Female α 1 α 2 α 1 α 2 14 / 17

CRC example: log odds ratios Calculate the log odds ratio for proximal versus rectal CRC comparing AAs to non-aas, controlling for subject s gender. log odds of proximal versus rectal CRC for AA males is α 1 + β 11 + β 12 log odds of proximal versus rectal CRC for non-aa males is α 1 + β 12 log odds ratio of proximal versus rectal CRC for AA males compared to non-aa males is β 11 Therefore, odds ratio of proximal versus rectal CRC for AA males compared to non-aa males is exp{β 11 } 15 / 17 CRC example: log odds ratios (cont.) Calculate the log odds ratio for proximal versus rectal CRC comparing AAs to non-aas, controlling for subject s gender. log odds of proximal versus rectal CRC for AA females is α 1 + β 11 log odds of proximal versus rectal CRC for non-aa females is α 1 log odds ratio of proximal versus rectal CRC for AA females compared to non-aa females is β 11 Therefore, odds ratio of proximal versus rectal CRC for AA females compared to non-aa females is exp{β 11 } 16 / 17

CRC example: log odds ratios (cont.) Not surprisingly, the odds ratios for proximal versus rectal CRC comparing AAs to non-aas was the same for males and females This is because there was no interaction in the model That is to say, we assume homogeneity of the local odds ratios Other ORs of interest can be calculated by exponentiating differences of appropriately selected log odds 17 / 17