Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Similar documents
Integration of SAS and NONMEM for Automation of Population Pharmacokinetic/Pharmacodynamic Modeling on UNIX systems

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Fitting PK Models with SAS NLMIXED Procedure Halimu Haridona, PPD Inc., Beijing

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

The SEQDESIGN Procedure

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

Hierarchical expectation propagation for Bayesian aggregation of average data

A SAS/AF Application For Sample Size And Power Determination

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

STAT 7030: Categorical Data Analysis

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

CTP-656 Tablet Confirmed Superiority Of Pharmacokinetic Profile Relative To Kalydeco in Phase I Clinical Studies

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Chapter 1 Statistical Inference

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Leverage Sparse Information in Predictive Modeling

Qinlei Huang, St. Jude Children s Research Hospital, Memphis, TN Liang Zhu, St. Jude Children s Research Hospital, Memphis, TN

Chapter 1. Modeling Basics

Generating Half-normal Plot for Zero-inflated Binomial Regression

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Performing response surface analysis using the SAS RSREG procedure

INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES

Statistics and Data Analysis

Paper CD Erika Larsen and Timothy E. O Brien Loyola University Chicago

Analysing categorical data using logit models

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

SAS macro to obtain reference values based on estimation of the lower and upper percentiles via quantile regression.

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Normalization of Peak Demand for an Electric Utility using PROC MODEL

A Multistage Modeling Strategy for Demand Forecasting

Lecture 13: More on Binary Data

Optimal Design for Hill Model

Binary Dependent Variables

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES

Multinomial Logistic Regression Models

Modeling Effect Modification and Higher-Order Interactions: Novel Approach for Repeated Measures Design using the LSMESTIMATE Statement in SAS 9.

Statistics in medicine

More Statistics tutorial at Logistic Regression and the new:

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

Binary Logistic Regression

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Power calculation for non-inferiority trials comparing two Poisson distributions

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

2 >1. That is, a parallel study design will require

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

The general concept of pharmacokinetics

Generalized Linear Models for Non-Normal Data

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Generalized Models: Part 1

ABSTRACT INTRODUCTION. SESUG Paper

Dose-response modeling with bivariate binary data under model uncertainty

Dosing In NONMEM Data Sets an Enigma

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Analyzing and Interpreting Continuous Data Using JMP

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Modeling Land Use Change Using an Eigenvector Spatial Filtering Model Specification for Discrete Response

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

DISPLAYING THE POISSON REGRESSION ANALYSIS

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Models for Binary Outcomes

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 17-1: Regression with dichotomous outcome variable - Logistic Regression

Investigating Models with Two or Three Categories

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

COMPLEMENTARY LOG-LOG MODEL

Stepwise Gatekeeping Procedures in Clinical Trial Applications

1 The problem of survival analysis

Urban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras

A Clinical Trial Simulation System, Its Applications, and Future Challenges. Peter Westfall, Texas Tech University Kuenhi Tsai, Merck Research Lab

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Statistics in medicine

Simple logistic regression

Lecture 1 Introduction to Multi-level Models

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Comparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice

Whether to use MMRM as primary estimand.

THE IMPORTANCE OF THE SIMULATION EXPECTATION AS A GOODNESS OF FIT DIAGNOSTIC FOR CATEGORICAL POPULATION PHARMACODYNAMIC MODELS

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Introduction to Generalized Models

Growth Mixture Model

Repeated ordinal measurements: a generalised estimating equation approach

As mentioned in the introduction of the manuscript, isoboles are commonly used to analyze

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Transcription:

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population pharmacodynamic analyses of dose-response (binary efficacy or safety endpoints). Limited by the model structure p(y) = exp(y)/(1+exp(y)), where p(y) is the response probability and y = logit(x) is a function of the explanatory variable vector x (usually drug exposure and other covariates), direct use of this procedure to some special data might yield misleading results. Although the logarithm (and other) transformations of explanatory variables can expand the use of logistic regression to those types of data, eligible explanatory variables for transformation cannot include those with zero values, such as dose or drug exposure in placebo subjects. This could make the analysis even more difficult, especially when all dosages lie near or at the plateau of the response. An alternative solution may be the use of PROC NLIN to model, as a continuous function, the outcome probabilities at each level of the explanatory variable. A dose-response case study with a limited number of treatment groups, including placebo, were illustrated, where alternative methods of modeling were better implemented in PROC NLIN. Utilization of SAS Logistic or alternative approaches requires thorough understanding of these procedures, the underlying methodology, data features, and the physiological meaning of variables. Keyword: Logistic regression, nonlinear, log transformation INTRODUCTION Exposure response relationship is very important in evaluating the efficacy and safety of a drug. The response, as a pharmacodynamic (PD) endpoint in both efficacy and safety studies for a drug, is frequently recorded as binary data. The exposure may refer to dose, drug or metabolite concentrations, or AUC (area under concentration-time profile) values. The exposure response relationship is frequently modeled using the method of logistic regression, such as SAS PROC LOGISTIC 1. In logistic regression, the basic model structure is illustrated as in Equation 1: p(y) = exp(y)/(1+exp(y)) (1) It assumes that the response probability (p) of a patient to y (a function of exposure, such as dose) is always from 0 to 1 in an S-shape pattern, as illustrated in Figure 1. This paper will discuss the application of the basic model defined by Equation 1 to pharmacodynamic data in different approaches. To simplify the description, a simulated case study was introduced in this paper. In this study, 400 patients were evenly grouped to take a daily dose of 0 (placebo), 2, 4 and 8 mg of a hypothetical drug. At the end of treatment, 12, 41, 41 and 54 patients (out of 100) in each group were observed to have response (yes) to a PD endpoint. That is to say, the population response probability was 0.12, 0.41, 0.41 and 0.54, respectively. Response probability p(y) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-15 -10-5 0 5 10 15 Figure 1. The shape of the curve of p(y) versus y in Equation 1. ORDINARY LOGISTIC REGRESSION y When Equation 1 is actually applied to pharamcodynamic endpoints of a drug, y is usually a function of exposure variables such as dose or concentrations or AUC values. In the case of ordinary logistic regression, y is assumed to be a linear function of exposure (x), as expressed in Equation 2. y = intercept + coeff*x (2)

Substitution of y in Equation 1 with Equation 2 leads to: p(x) = B*exp(coeff*x)/(1+B*exp(coeff*x)) (3) where B = exp(intercept). Ordinary logistic regression is therefore to obtain the estimates of intercept and coeff in Equation 2 (expressed as the logit function in SAS ) by fitting Equations 3 to the exposure response data. Obviously, Equation 1 represents a special case of Equation 3 with intercept=0 (thus B=1) and coeff=1. When coeff = 1 while intercept 0 (then B 1), the curve of p(x) vs. x is equivalent to shift the curve of p(y) vs. y with a distance of intercept to the right (if intercept<0) or to the left (if intercept>0) in Figure 1. Therefore, the curves of p(x) vs. x and p(y) vs. y in this case have exactly the same shape: S-shape, in a regular coordinate system. Generally, the curve of p(x) vs. x has a different steepness (determined by coeff) and x 50 (defined as the x value at which the response probability is 0.5 and determined by B and coeff) than the curve of p(y) vs. y even though both have the same S-shape. Note that, in practice where ordinary logistic regression applies, the S- shape does not necessarily appear in the exposureresponse graph if the measured exposure (x) range is beyond the inflection point (x i = -intercept/coeff), i.e., x x i, where response probability p(x i ) 0.5. of odds ratios from logistic regression with log transformation were investigated by Keen 2 and Elswick Jr. et al 3, respectively. Assuming: y = a + b*log(x) (4) where a, b, x are intercept, coefficient and exposure, e.g., dose, respectively. The probability function (Equation 1) becomes: p(x) = A*x b /(1+A*x b ) (5) where A = exp(a). Note that when x=c, b=γ and A=1/C 50 γ, this equation becomes the same equation as that for the concentration-response probability relationship used by Bailey and Gregg 4 for investigating the inter-patient variability (Probit regression). When ordinary logistic regression is applied to the case study, the parameters (as in Equation 2) are estimated as (mean±standard deviation): intercept = -1.271±0.178 and coeff = 0.200±0.037. The model predictions and measurements are illustrated in Figure 2. As is shown, the model-predictions of the response probability at placebo and the dosage of 2 mg are at least 0.1 off from the measurements (with a relative standard deviation up to 50% or 25%). In addition, the predicted S-shape dose-response relationship seems not to sufficiently agree with the measured C-shape relationship where measurements at all dose levels seem to be near or at the plateau of the curve. Since only four dose levels including placebo are available to develop a dose-response relationship and all three dose levels appear to be near or at the plateau of the response curve, it is worthwhile to explore other methods of developing the dose-response relationship. LOGISTIC REGRESSION WITH LOGARITHM TRANSFORMATION Log transformation expands logistic regression analysis from S-shape curves to C-shape curves and the interpretation of parameter estimates is different 2-3 - Characteristics of log transformation and interpretation Figure 2. Response probability versus dose for the case study. Pred_logo in the legend represents model predictions from ordinary logistic regression. In a regular coordinate system, the curve of p(x) vs. x expressed by Equation 5 is a C-shape curve when b 1 or an S-shape curve when b > 1. The maximum probability is 1 (when x infinity). Also note that, the S-shape of the curve may not appear in a graph of response probability vs. exposure if the exposure range (x) is beyond the inflection point (x i = [exp(-a)(b- 1)/(b+1)] 1/b ), i.e., x x i where the response probability p(x) (b-1)/(2b). 2

For the case study, since only four dose levels including placebo are used in the clinical trial, direct log transformation will result in discarding of the data point for placebo. Thus, only three dose levels were available for regression. The model inference based on these three dose levels would be misleading the predicted response probability is constant across the dose range (not shown in Figure 3), as indicated by the parameter estimates in the last row in Table 1. At least, the model-predictions of constant probability (0.45) should not be extrapolated to placebo. closer to the measurement the model-predicted probability at placebo. When the perturbation is less than 10-6, the model-predicted dose-response curves are visually indistinguishable, although their parameter estimates in logistic model are different. Table 1 lists parameter estimates of the logistic regression model after log transformation with different perturbation levels to dose values. Visually, the fitting by logistic regression with log transformation and perturbation is slightly better than that by ordinary logistic regression. However, the approximation for placebo using perturbation approach is tentative and might not be readily accepted. As a matter of fact, this approximation can actually be avoided if Equation 5 is directly used to fit the data, instead of utilizing log transformation in logistic regression (Equation 1 plus 4). This nonlinear modeling can be implemented with SAS PROC NLIN, which is extensively applied in engineering. Table 1. The Parameter Estimates for A Logistic Model after Log Transformation (refer to Equation 4) Figure 3. Dose-response relationship fit by logistic regression with log transformation of dose. Pred_log2, Pred_log4 and Pred_log6 are model-predictions under 3 different perturbation levels: dose = dose+0.01, dose+0.0001, or dose+0.000001, respectively, in order to implement the log transformation on placebo. Perturbation level to doses a 0.01-0.610 (0.115) 0.0001-0.424 0.000001-0.351 Placebo -0.187 excluded (0.116) b 0.306 (0.053) 0.175 (0.031) 0.122 (0.022) 0 Therefore, when all response measurements are near or at plateau of the exposure-response curve, the data point that represents placebo is critical for analysis and should be included. To do this, a small perturbation to doses for log transformation might be helpful. For example, 0.000001 can be added to all dose levels, including placebo. Thus, the value of log transformation of placebo will be 13.8 (natural logarithm) while the values of log transformation of dose 2, 4, and 8 are actually not changed. When this strategy is used, the model predictions are reasonably consistent with measurements. Figure 3 demonstrates the fit of the model (Equation 5) with logistic regression after log transformation (Equation 4) with different levels of perturbation on doses. Stepwise selection of covariates was used with entry criterion of p=0.05 and elimination criterion of p=0.01. Generally, the smaller the perturbation, the PROC NLIN PROCEDURE When the regression model or the shape of the exposureresponse curve is known, SAS PROC NLIN is a good option for nonlinear models, including those for PD data. Actually, the dose-response curve shape for the case study can be described with the following general model structure: p(x) = α + β*x γ /(δ+ x γ ) (6) Equation 5 represents just a special case of Equation 6 when α=0, β=1, γ=b and δ=a -1. Similar to Equation 5, Equation 6 also represents 2 different shapes of curves: C-shape when γ 1 and S-shape otherwise, with the inflection point at x i = [β(γ-1)/(1+γ)] 1/γ. Therefore, 3

exposure-response models, which can be developed through PROC LOGISTIC with log transformation, can be potentially developed through PROC NLIN without any transformation. In addition, PROC NLIN can work on more types of datasets. One advantage of using PROC NLIN is that placebo data can be directly included for analysis without any approximation treatment. model. However, this model means that the maximum probability is 0.6. Theoretically, it does not make sense because for whatever drug, when the drug dose goes to infinity, the response probability should approach 1 (for either efficacy or safety). In practice, it could be true that in a certain range of dosages the response probability keeps under certain value. Whether this is really true or not is confirmable from measurements of additional expanded dose levels or prior information from fundamental studies. When the maximum probability is constrained to 1, the fitting is slightly worse, but it is at least comparable to, if not better than, that through logistic regression with log transformation. As a comparison, Equation 5 was directly used to fit the data using PROC NLIN and the model-prediction is shown in Figure 4 by the legend Pred_nlin4. The parameter estimates are (mean±standard deviation): A=0.486±0.162 and b=0.384±0.220 (refer to Equation 5). Obviously, this fit is much better than that using logistic regression with log transformation although the models are equivalent. Table 2. Parameter Estimates of the Model Expressed by Equation 6 via SAS PROC NLIN Procedure Figure 4. Model predictions of exposure-response relationship (Equation 6) with γ=1 under different conditions: Pred_nlin1 all three parameters α, β and δ are estimated from fitting; Pred_nlin2 α and δ are estimated from fitting while β is fixed as 1; Pred_nlin3 only δ is estimated from fitting while α is fixed as 0.12, referring to the measured response probability for the placebo, and β is fixed as 1. Pred_nlin4 model (Equation 5) fitting using PROC NLIN. Modeling α, β Nlin1 0.122 (0.046) 0.475 Nlin2 0.149 1 α (0.070) fixed Nlin3 0.12 fixed 1 - α fixed δ 1.644 (1.272) 7.95 (3.54) 9.02 (1.75) For the case study, a reduced function of Equation 6 with γ=1was tried. Different from logistic regression which directly uses raw binary data (0 or 1) for the response variable (dichotomous), the response variable in nonlinear models are continuous whose values are calculated subpopulation probabilities (from 0 to 1) at each exposure level and/or covariate group. Since no covariate is identified significant for this particular case study during logistic regression, the response probability is therefore simply calculated from the subpopulation at each dose level. The fittings by the model via PROC NLIN under 3 conditions are illustrated in Figure 4 while their parameter estimates are listed in Table 2. As expected, when all parameters are obtained from regression, the fitting is the best (refer to Pred_nlin1 in Figure 4) since more parameters are included in the MODEL/METHOD SELECTION As discussed previously, for the case study in this paper, there are at least three potential approaches to model the limited data: ordinary logistic regression, logistic regression with log transformation, and nonlinear model fitting. Each of them has advantages and disadvantages. The quality of fitting is also different, as demonstrated in Figure 5. If the measurements are representative and reliable, i.e. close to the true values, the model with the best prediction is the best. For this particular case study, general nonlinear model through PROC NLIN is superior to logistic regressions since the model (refer to Prednlin1 in Figure 5) is best fitting to the data. 4

REFERENCE 1. SAS Institute Inc. SAS/STAT User s Guide, Volume 2. Version 6, 4 th edition. Cary, NC: SAS Institute Inc., 1994. 2. N Keene. The Log Transformation Is Special. Statistics in Medicine, Vol. 14, 811-819 (1995). 3. R K Elswick, Jr, P F Schwartz and J A Welsh. Interpretation of the Odds Ratio from Logistic Regression after A Transformation of the Covariate Vector. Statistics in Medicine, Vol. 16, 1695-1703 (1997). 4. J M Bailey and K M Gregg. A Technique for Population Pharmacodynamic Analysis of Concentration-Binary Response Data. Anaesthesiology1997; 86:825-35. Figure 5. Comparison of the quality of fitting for three different approaches: Pred_nlin1 totally free nonlinear model (Equation 6 with γ=1) fitting using PROC NLIN; Pred_nlin4 nonlinear model (Equation 5) fitting via PROC NLIN; Pred_logo ordinary logistic regression via PROC LOGISTIC; and Pred_log6 logistic regression after log transformation on doses plus approximating placebo with a dosage of 0.000001. SUMMARY Logistic regression is a powerful tool widely used to perform PD data analysis. However, its applicability is limited by its strict assumptions inherited in the model structure. Although log transformation can expand the application of logistic regression, the transformation process itself might restrict this expansion when placebo data has to be included for analysis. Nonlinear modeling through PROC NLIN is generally a more flexible and powerful approach. However, prior information about the model structure is required and whether PROC NLIN is successful or not sometime depends on the model structure and appropriateness of the initial guesses for parameter estimates. Exploratory graphs of exposure response probability should be helpful to select the primary methods and models and logistic regression could be a convenient primary option. However, when logistic regression cannot work very well, alternative methods should be explored. The selection of the final model should be based on the combined information about the characteristics of data, quality of fitting and physiological rationale. SAS and all other SAS Institute Inc. product r service names are registered trademarks or trade marks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. DISCLAIMER This presentation only reflects my current personal thinking on potential alternative approaches to PD data analysis when logistic regression as a primary option cannot work well, based on my experience in engineering area. So far, none of these approaches has yet been applied to any real projects. No material in this presentation is from Cognigen Corporation. Under no circumstances should this presentation be related to the position that Cognigen Corporation takes on PD data analysis. CONTACT INFORMATION The author can be contacted at: Alan J Xiao, Ph.D. Population PK/PD Scientist Cognigen Corporation 395 Youngs Road Buffalo, NY 14221-5831 Phone: 716-633-3463 ext. 265 Fax: 716-633-7404 Email: alan.xiao@cognigencorp.com Web: www.cognigencorp.com 5