Lecture 13: More on Binary Data

Similar documents
12 Modelling Binomial Response Data

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Single-level Models for Binary Responses

Introduction to General and Generalized Linear Models

Generalized Linear Models Introduction

Local&Bayesianoptimaldesigns in binary bioassay

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

STAT5044: Regression and Anova

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Linear Regression Models P8111

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Some Preliminary Market Research: A Googoloscopy. Parametric Links for Binary Response. Some Preliminary Market Research: A Googoloscopy

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

STA 216, GLM, Lecture 16. October 29, 2007

Linear regression is designed for a quantitative response variable; in the model equation

Sections 4.1, 4.2, 4.3

COMPLEMENTARY LOG-LOG MODEL

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

9 Generalized Linear Models

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

Lecture 6 STK Categorical responses

Answer Key for STAT 200B HW No. 7

Generalized Linear Models and Exponential Families

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Chapter 4: Generalized Linear Models-I

Discussion of Paper by Bendel Fygenson

MSH3 Generalized linear model

Chapter 1. Modeling Basics

Introduction Fitting logistic regression models Results. Logistic Regression. Patrick Breheny. March 29

Categorical data analysis Chapter 5

Generalized Linear Models

Statistical Modelling with Stata: Binary Outcomes

Classification 1: Linear regression of indicators, linear discriminant analysis

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

STAT 7030: Categorical Data Analysis

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

Lecture 10: Introduction to Logistic Regression

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

High-Throughput Sequencing Course

Dose-response Modeling for Ordinal Outcome Data

Generalized linear models

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

When is MLE appropriate

MS&E 226: Small Data

Likelihoods for Generalized Linear Models

Generalized Linear Models for Non-Normal Data

Introduction to Generalized Linear Models

Classification 2: Linear discriminant analysis (continued); logistic regression

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Generalized logit models for nominal multinomial responses. Local odds ratios

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Stat 579: Generalized Linear Models and Extensions

Bayesian linear regression

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

11. Generalized Linear Models: An Introduction

Linear Regression With Special Variables

Homework 1 Solutions

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Introduction to General and Generalized Linear Models

Masters Comprehensive Examination Department of Statistics, University of Florida

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

MS&E 226: Small Data

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

Generalized Linear Models. Kurt Hornik

Generalised linear models. Response variable can take a number of different formats

Introduction To Logistic Regression

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

Logit Regression and Quantities of Interest

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Generalized Linear Models: An Introduction

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

Problem Selected Scores

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

BIOSTATISTICS METHODS

8 Nominal and Ordinal Logistic Regression

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Analysing categorical data using logit models

Multinomial Logistic Regression Models

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

BMI 541/699 Lecture 22

Classification. Chapter Introduction. 6.2 The Bayes classifier

Generalized Linear Models and Extensions

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Transcription:

Lecture 1: More on Binary Data Link functions for Binomial models Link η = g(π) π = g 1 (η) identity π η logarithmic log π e η logistic log ( π 1 π probit Φ 1 (π) Φ(η) log-log log( log π) exp( e η ) complementary log( log(1 π)) 1 exp( e η ) log-log ) e η 1+e η Comparison of Link Functions g(p) -4-2 0 2 4 Logit Probit CLL 00 02 04 06 08 10 p When g is the identity or logarithmic function, ˆπ = g 1 (ˆη) may lie outside the interval [0, 1] Therefore, these link functions may not be the best choices The logistic and probit are by far the most commonly used 1

The probit model requires numerical integration in the computation of the MLEs (since Φ does not have a closed form) Both the logistic and probit are symmetric about π = 1/2, and produce very similar results in most analysis (eg, beetle example) unless there are many small or large probabilities A very large amount of data would be required to show that one was better than the other The logistic, probit, and complementary log-log links are similar for small π The complementary log-log link is skewed (asymmetric) Advantages of the logistic model The logistic model provides direct interpretation as log-odds of a success This interpretation is particularly nice in the context of case-control studies Here, the exposure level (eg of a toxin) is compared between individuals who have a particular disease or condition (the cases ) and those who do not have the disease (the controls ) The logit link is the canonical link for binomial distribution, hence is mathematically convenient Interpretation of the link functions: bioassays and dose-response models A biological assay is a method for estimating the potency of a material by means of the reaction which follows its application to living matter An m-point assay is an assay where a lethal drug is administered in m doses d 1,, d m where d i = log(concentration) (usually) In the experiment, n i subjects get dose d i, and y i of these subjects die We model Y i Binomial(n i, π i ), where π i = P(death given dose i) = function of d i 2

Tolerance Distribution We think of different subjects as having different tolerances to a drug Define D to be the minimum dose required to produce a response in a subject Then D is a random variable Its distribution is called the tolerance distribution, and is denoted by F D An animal is killed by dose d i if and only if its tolerance is less than or equal to d i, ie, D d i Thus, the probability π i of death at dose d i is given by π i = P(D < d i ) = F D (d i ) Probit Analysis Based on the assumption of a normal tolerance distribution, ie D N(µ d, σ 2 d), then ( ) di µ d π i = P(D < d i ) = Φ, σ d where Φ is the N(0, 1) cdf Thus, where β 0 = µ d /σ d and β 1 = 1/σ d Φ 1 (π i ) = d i µ d σ d β 0 + β 1 d i, Hence, the mean and standard deviation of the tolerance distribution can be estimated via the regression parameters in this model Logistic analysis Assume now that the tolerance distribution is the logistic distribution, ie f D (d) = exp {(d µ d )/τ} τ [1 + exp {(d µ d )/τ}] 2, where < d <, < µ d <, and τ > 0 Then E[D] = µ d, and Var[D] = π 2 τ 2 / (where π = 1415 ) Under this assumption, π i = F D (d i ) = exp {(d i µ d )/τ} 1 + exp{(d i µ d )/τ}

where β 0 = µ d /τ and β 1 = 1/τ ( ) πi log = d i µ d 1 π i τ = β 0 + β 1 d i, Complementary log-log analysis Similarly, the assumption of the extreme-value tolerance distribution a d i F D (d i ) = 1 e e b (where b < 0) leads to the complementary log-log model Example: Beetle data, cont Goal: Find the tolerance distribution We chose a linear complementary log-log model for the probability π of death at a given dose x This implies that the tolerance distribution is the extreme value distribution Therefore, F D (x) = π The density function of D is given by = 1 exp [ exp(β 0 + β 1 x)] f(x) = β 1 exp [(β 0 + β 1 x) exp(β 0 + β 1 x)] An estimate of this density can be obtained by substituting the estimated values ˆβ 0 and ˆβ 1 : ˆf(x) = 01525 exp [( 960 + 01525x) exp( 960 + 01525x)] Probit-logistic relationship The normal and logistic distributions are often quite similar (for certain values of their parameters) Consider the case where the true tolerance distribution is similar to these distributions In this case, we would expect the GLMs with probit and logistic links to fit almost equally well More specifically, if we construct logistic and normal tolerance distributions with the same mean and variance (so that these distributions are very similar), we can see the following approximate relationship between the probit and logistic models: 4

Estimated Tolerance Distribution (Extreme Value Density) f(x) 00 001 002 00 004 005 40 50 60 70 80 x Equating variances gives σ 2 = π 2 τ 2 / σ = π τ Thus, probit(π) = d µ d = σ π ( d µd τ ) ( ) = π logit(π) 055 logit(π) If we denote the linear predictor under the probit model by η i = p j=1 x ij β j and the linear predictor under the logit model by ηi = p j=1 x ij βj, then η i 055ηi p x ij β j p 055 x ij βj (for all x ij ) j=1 j=1 β j 055β j (for all j) In this case, we see that the regression coefficients estimated under the probit model are approximately 055 times those estimated under the logit model Thus, typically we choose between the logit and probit model based on considerations such as interpretation rather than goodness-of-fit 5