Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Similar documents
Ch16.xls. Wrap-up. Many of the analyses undertaken in biology are concerned with counts.

Data files & analysis PrsnLee.out Ch9.xls

Wrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.

Sleep data, two drugs Ch13.xls

SRBx14_9.xls Ch14.xls

Data files and analysis Lungfish.xls Ch9.xls

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Where have all the puffins gone 1. An Analysis of Atlantic Puffin (Fratercula arctica) Banding Returns in Newfoundland

BMI 541/699 Lecture 22

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Generalized linear models

Investigating Models with Two or Three Categories

Simple logistic regression

Models for Binary Outcomes

Lecture 12: Effect modification, and confounding in logistic regression

COMPLEMENTARY LOG-LOG MODEL

1 Introduction to Minitab

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Chapter 5: Logistic Regression-I

Multinomial Logistic Regression Models

Stat 642, Lecture notes for 04/12/05 96

Today: Cumulative distributions to compute confidence limits on estimates

STAT 7030: Categorical Data Analysis

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Generalized linear models

Cohen s s Kappa and Log-linear Models

Chapter 1. Modeling Basics

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Six Sigma Black Belt Study Guides

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Sections 4.1, 4.2, 4.3

Multiple Regression Examples

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

The Flight of the Space Shuttle Challenger

Introduction to General and Generalized Linear Models

Poisson Data. Handout #4

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Basic Medical Statistics Course

8 Nominal and Ordinal Logistic Regression

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

6.1 Frequency Distributions from Data. Discrete Distributions Example, Four Forms, Four Uses Continuous Distributions. Example, Four Forms, Four Uses

Categorical data analysis Chapter 5

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Generalized Linear Models

GLMM workshop 7 July 2016 Instructors: David Schneider, with Louis Charron, Devin Flawd, Kyle Millar, Anne St. Pierre Provencher, Sam Trueman

Case-control studies C&H 16

Linear Regression Models P8111

Review of One-way Tables and SAS

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Age 55 (x = 1) Age < 55 (x = 0)

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

12 Modelling Binomial Response Data

Testing Independence

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Longitudinal Modeling with Logistic Regression

Introduction to Statistical Analysis

Model Based Statistics in Biology. Part III. The General Linear Model. Chapter 8 Statistical Inference with the General Linear Model

Chapter 1 Statistical Inference

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Inference for Binomial Parameters

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

9 Generalized Linear Models

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Multiple Logistic Regression for Dichotomous Response Variables

The simple linear regression model discussed in Chapter 13 was written as

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Exam details. Final Review Session. Things to Review

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12

Generalized Linear Models

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

STAT 525 Fall Final exam. Tuesday December 14, 2010

Multiple Regression an Introduction. Stat 511 Chap 9

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Count data page 1. Count data. 1. Estimating, testing proportions

STA6938-Logistic Regression Model

Statistics in medicine

STAT5044: Regression and Anova

Chapter 14 Logistic and Poisson Regressions

ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure

Poisson regression: Further topics

Survival Regression Models

Transcription:

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13, 14) 18 Binomial Response Variables 18.1 18.2 Logistic Regression (-Response) Single Factor. Prospective Analysis 18.3 18.4 Single Factor. Retrospective Analysis Single Random Factor. 18.5 Single Explanatory Variable. Ordinal Scale. 18.6 18.7 Two Categorical Explanatory Variables Logistic ANCOVA on chalk board Ch18.xls ReCap Part I (Chapters 1,2,3,4) Quantitative reasoning ReCap Part II (Chapters 5,6,7) Hypothesis testing and estimation ReCap (Ch 9, 10,11) The General Linear Model with a single explanatory variable. ReCap (Ch 12,13,14,15) GLM with more than one explanatory variable ReCap (Ch 16,17) ReCap (Ch 18) Binomial data is analyzed within the framework of the generalized linear model. The response variable is the odds, compute from the proportion of cases p. Today: Binomial response variables. Logistic regression (dose-response analysis). Wrap-up. We analyze dose-response data with logistic regression, in which the response variable is the odds, and relation of odds to dose is exponential. 1

Introduction. Logistic regression is a special case of the generalized linear model in which the response variable is binomial, and there is a single ratio scale explanatory variable. The log link is used. An important application is the analysis of response to drug. Does the risk of developing a tumor increase with dosage of aflatoxin B_1? 18 0 0 22 2 1 22 1 5 21 4 15 25 20 50 28 28 100 N Ntmr N = number of experimental animals fed aflatoxin B_1, a suspected carcinogen. Ntmr = number developing liver tumors = amount fed to animals (ppb) Data from D.W. Gaylor (1987) Linear-nonparametric upper limits for low dose extrapolation. American Statistical Association: Proceedings of the Biopharmaceutical Section 63-66. Preliminary computations: proportions, Odds, and binomial variances. We begin by computing the proportion of animals that develop tumors, and the odds of having a tumor at each dosage. Cases Cases Proportion Odds w/ tumors w/tumors ppb N Ntumor p=ntumor/n p/(1-p) 0 18 0 0.0000 1 22 2 0.0909 0.1000 5 22 1 0.0455 0.0476 15 21 4 0.1905 0.2353 50 25 20 0.8000 4.0000 100 28 28 1.0000 The odds of having a tumor increase with dosage. The increase is not a linear function of the dose. For dose-response analysis, we assume that relation is exponential. 2

1. Construct Model Verbal model. N is number of experimental animals at each dosage. Ntumor is number of animals that develop tumors. p is the proportion of animals having tumors at each dosage. The odds of developing tumors are p/(1-p). The odds of developing tumors increase with dose. Graphical model Plot of odds versus dosage shows a curvilinear relation. Plot of log odds versus dosage shows a linear relation. Conclude from this that dose has a multiplicative rather than additive effect on odds of developing tumors. Odds Odds 5 4 3 2 1 0 4.0000 2.0000 1.0000 0.5000 0.2500 0.1250 0.0625 0.0313 Response variable: odds of developing tumors Explanatory variable: dose of aflatoxin B_1 Explanatory variable has multiplicative effect on odds of developing tumors. Write formal model as a multiplicative effect p Odds= 1 p ( η Odds = e ) + Binomial Error η = β + β o Definition of odds The link between the Odds and the srtructural model 0 is: Odds 0 10 20 30 40 50 60 0 10 20 30 40 50 60 = e η This is called a logit link. Why do we use a logit link? < Odds are a natural way to interpret changes in risk. < Odds ratios have the convenient property that we can invert them. If the odds of developing a tumor double with each increment in dose, then we can say that the odds are halved with each decrement in dose. This property does not apply to percentages. < Graphical analysis (above) supports the idea of multiplicative effect on dose on odds of developing tumors. < The logit link puts multiplicative effects on an additive scale, which is convenient for statistical analysis. < Using a logit link avoids the inconvenience of log transforming the response variable. We use a binomial error because each unit is scored in two categories, tumor or no tumor. 3

2. Execute analysis. Place data in model format: Binomial response variable in two columns, success and trials Column labelled Ncases, with response variable # of animals Column labelled Ntumor, with response variable # of animals with tumors Column labelled, with explanatory variable (in ppm) Data Gaylor; Input Ncases Ntumor ; Cards; 18 0 0 22 2 1 22 1 5 21 4 15 25 20 50 28 28 100 ; The data are in model format. MTB > print c1-c3 Row Ncase Ntumor 1 18 0 0 2 22 2 1 3 22 1 5 4 21 4 15 5 25 20 50 6 28 28 100 Proc Genmod; Model Ntumor/Ncase = / Link=logit dist=binomial type1 type3; SAS command file In a package with spreadsheet format, there will be three columns (variables) and six rows. Code the GzLM model statement in statistical package ( η Odds = e ) + error η = β + β o Minitab output SAS command file MTB > BLogistic 'Ntumor' 'Ncase' = ; as of 2002 SUBC> ST; SUBC> Logit; SUBC> Brief 2. Minitab command lines Click Stat Click Regression Click Binary Logistic Regression Click Success, place column of Ntumor, Click trials, place column of Ncase Click Model, place column with Click Storage (optional) Click Pearson residuals, Event probability, ok Minitab sequence to produce line commands 4

GaylorGLM <- glm(ntumor/n ~, family = binomial(link = logit), weights = N, data = Gaylor) summary(gaylorglm) plot(gaylorglm) R command lines 3. Use residuals to evaluate model. a. Straight line assumption acceptable, no bowl or arch in residual plot. b. Too few residuals above fitted value of 0.2 to evaluate homogeneity assumption. c. Use Q-Q plot to evaluate normality assumption. Deviance Residuals -1.0-0.5 0.0 0.5 2 1 3 0.2 0.4 0.6 0.8 1.0 Fitted : 4. Population. All possible outcomes if the experiment were repeated on similar animals and the same suspected carcinogen, dosages, and experimental protocol. 5. Decide on mode of inference. Is hypothesis testing appropriate? There is little doubt that the odds of developing a tumor increase with dose. It is of interest to estimate the rate at which odds change in relation to dose. We skip to step 10. 5

10. Analysis of parameters of biological interest. Logistic regression routines estimate the parameters. Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant -3.0360 0.4823-6.30 0.000 0.09009 0.01456 6.19 0.000 1.09 1.06 1.13 β e o = e 30360. = 0. 048 β e = e 0. 09009 = 1094. Model odds for zero dose Minitab output Change in odds for each dose increment The confidence limits for the odds ratio of 1.094 are narrow: 1.06 to 1.13 The confidence limit excludes the null hypothesis (OR = 1.00) that the odds do not change with dose. With these estimates we can compute the expected odds at any dose. βo + β 50 3. 036+ 0. 09009 50 e = e = e 1469. = 4. 34 The odds of developing a tumour at a dose of 50 ppb is 4.34 to 1. We can compare the dose response relation (OR = 1.094) with other dose-response relations because we have a standard error on the log odds ratio ln OR = 0.09009 + 0.01456 Note that if we use the log link (response variable is the proportion p), rather than logit link (response variable is Odds), we obtain different parameter estimates and we would conclude that there is no significant relation of response to dose. ( η p = e ) + Poisson Error Response variable is proportion p (log link) η = β + β o Parameter DF Estimate Standard Wald 95% Chi- Error Confidence Limits Square Pr > ChiSq Intercept 1-1.4209 0.1893-1.7918-1.0500 56.37 <.0001 1 0.0142 0.0079-0.0013 0.0297 3.23 0.0722 Scale 0 1.0000 0.0000 1.0000 1.0000 SAS output 6