Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13, 14) 18 Binomial Response Variables 18.1 18.2 Logistic Regression (-Response) Single Factor. Prospective Analysis 18.3 18.4 Single Factor. Retrospective Analysis Single Random Factor. 18.5 Single Explanatory Variable. Ordinal Scale. 18.6 18.7 Two Categorical Explanatory Variables Logistic ANCOVA on chalk board Ch18.xls ReCap Part I (Chapters 1,2,3,4) Quantitative reasoning ReCap Part II (Chapters 5,6,7) Hypothesis testing and estimation ReCap (Ch 9, 10,11) The General Linear Model with a single explanatory variable. ReCap (Ch 12,13,14,15) GLM with more than one explanatory variable ReCap (Ch 16,17) ReCap (Ch 18) Binomial data is analyzed within the framework of the generalized linear model. The response variable is the odds, compute from the proportion of cases p. Today: Binomial response variables. Logistic regression (dose-response analysis). Wrap-up. We analyze dose-response data with logistic regression, in which the response variable is the odds, and relation of odds to dose is exponential. 1
Introduction. Logistic regression is a special case of the generalized linear model in which the response variable is binomial, and there is a single ratio scale explanatory variable. The log link is used. An important application is the analysis of response to drug. Does the risk of developing a tumor increase with dosage of aflatoxin B_1? 18 0 0 22 2 1 22 1 5 21 4 15 25 20 50 28 28 100 N Ntmr N = number of experimental animals fed aflatoxin B_1, a suspected carcinogen. Ntmr = number developing liver tumors = amount fed to animals (ppb) Data from D.W. Gaylor (1987) Linear-nonparametric upper limits for low dose extrapolation. American Statistical Association: Proceedings of the Biopharmaceutical Section 63-66. Preliminary computations: proportions, Odds, and binomial variances. We begin by computing the proportion of animals that develop tumors, and the odds of having a tumor at each dosage. Cases Cases Proportion Odds w/ tumors w/tumors ppb N Ntumor p=ntumor/n p/(1-p) 0 18 0 0.0000 1 22 2 0.0909 0.1000 5 22 1 0.0455 0.0476 15 21 4 0.1905 0.2353 50 25 20 0.8000 4.0000 100 28 28 1.0000 The odds of having a tumor increase with dosage. The increase is not a linear function of the dose. For dose-response analysis, we assume that relation is exponential. 2
1. Construct Model Verbal model. N is number of experimental animals at each dosage. Ntumor is number of animals that develop tumors. p is the proportion of animals having tumors at each dosage. The odds of developing tumors are p/(1-p). The odds of developing tumors increase with dose. Graphical model Plot of odds versus dosage shows a curvilinear relation. Plot of log odds versus dosage shows a linear relation. Conclude from this that dose has a multiplicative rather than additive effect on odds of developing tumors. Odds Odds 5 4 3 2 1 0 4.0000 2.0000 1.0000 0.5000 0.2500 0.1250 0.0625 0.0313 Response variable: odds of developing tumors Explanatory variable: dose of aflatoxin B_1 Explanatory variable has multiplicative effect on odds of developing tumors. Write formal model as a multiplicative effect p Odds= 1 p ( η Odds = e ) + Binomial Error η = β + β o Definition of odds The link between the Odds and the srtructural model 0 is: Odds 0 10 20 30 40 50 60 0 10 20 30 40 50 60 = e η This is called a logit link. Why do we use a logit link? < Odds are a natural way to interpret changes in risk. < Odds ratios have the convenient property that we can invert them. If the odds of developing a tumor double with each increment in dose, then we can say that the odds are halved with each decrement in dose. This property does not apply to percentages. < Graphical analysis (above) supports the idea of multiplicative effect on dose on odds of developing tumors. < The logit link puts multiplicative effects on an additive scale, which is convenient for statistical analysis. < Using a logit link avoids the inconvenience of log transforming the response variable. We use a binomial error because each unit is scored in two categories, tumor or no tumor. 3
2. Execute analysis. Place data in model format: Binomial response variable in two columns, success and trials Column labelled Ncases, with response variable # of animals Column labelled Ntumor, with response variable # of animals with tumors Column labelled, with explanatory variable (in ppm) Data Gaylor; Input Ncases Ntumor ; Cards; 18 0 0 22 2 1 22 1 5 21 4 15 25 20 50 28 28 100 ; The data are in model format. MTB > print c1-c3 Row Ncase Ntumor 1 18 0 0 2 22 2 1 3 22 1 5 4 21 4 15 5 25 20 50 6 28 28 100 Proc Genmod; Model Ntumor/Ncase = / Link=logit dist=binomial type1 type3; SAS command file In a package with spreadsheet format, there will be three columns (variables) and six rows. Code the GzLM model statement in statistical package ( η Odds = e ) + error η = β + β o Minitab output SAS command file MTB > BLogistic 'Ntumor' 'Ncase' = ; as of 2002 SUBC> ST; SUBC> Logit; SUBC> Brief 2. Minitab command lines Click Stat Click Regression Click Binary Logistic Regression Click Success, place column of Ntumor, Click trials, place column of Ncase Click Model, place column with Click Storage (optional) Click Pearson residuals, Event probability, ok Minitab sequence to produce line commands 4
GaylorGLM <- glm(ntumor/n ~, family = binomial(link = logit), weights = N, data = Gaylor) summary(gaylorglm) plot(gaylorglm) R command lines 3. Use residuals to evaluate model. a. Straight line assumption acceptable, no bowl or arch in residual plot. b. Too few residuals above fitted value of 0.2 to evaluate homogeneity assumption. c. Use Q-Q plot to evaluate normality assumption. Deviance Residuals -1.0-0.5 0.0 0.5 2 1 3 0.2 0.4 0.6 0.8 1.0 Fitted : 4. Population. All possible outcomes if the experiment were repeated on similar animals and the same suspected carcinogen, dosages, and experimental protocol. 5. Decide on mode of inference. Is hypothesis testing appropriate? There is little doubt that the odds of developing a tumor increase with dose. It is of interest to estimate the rate at which odds change in relation to dose. We skip to step 10. 5
10. Analysis of parameters of biological interest. Logistic regression routines estimate the parameters. Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant -3.0360 0.4823-6.30 0.000 0.09009 0.01456 6.19 0.000 1.09 1.06 1.13 β e o = e 30360. = 0. 048 β e = e 0. 09009 = 1094. Model odds for zero dose Minitab output Change in odds for each dose increment The confidence limits for the odds ratio of 1.094 are narrow: 1.06 to 1.13 The confidence limit excludes the null hypothesis (OR = 1.00) that the odds do not change with dose. With these estimates we can compute the expected odds at any dose. βo + β 50 3. 036+ 0. 09009 50 e = e = e 1469. = 4. 34 The odds of developing a tumour at a dose of 50 ppb is 4.34 to 1. We can compare the dose response relation (OR = 1.094) with other dose-response relations because we have a standard error on the log odds ratio ln OR = 0.09009 + 0.01456 Note that if we use the log link (response variable is the proportion p), rather than logit link (response variable is Odds), we obtain different parameter estimates and we would conclude that there is no significant relation of response to dose. ( η p = e ) + Poisson Error Response variable is proportion p (log link) η = β + β o Parameter DF Estimate Standard Wald 95% Chi- Error Confidence Limits Square Pr > ChiSq Intercept 1-1.4209 0.1893-1.7918-1.0500 56.37 <.0001 1 0.0142 0.0079-0.0013 0.0297 3.23 0.0722 Scale 0 1.0000 0.0000 1.0000 1.0000 SAS output 6