Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population pharmacodynamic analyses of dose-response (binary efficacy or safety endpoints). Limited by the model structure p(y) = exp(y)/(1+exp(y)), where p(y) is the response probability and y = logit(x) is a function of the explanatory variable vector x (usually drug exposure and other covariates), direct use of this procedure to some special data might yield misleading results. Although the logarithm (and other) transformations of explanatory variables can expand the use of logistic regression to those types of data, eligible explanatory variables for transformation cannot include those with zero values, such as dose or drug exposure in placebo subjects. This could make the analysis even more difficult, especially when all dosages lie near or at the plateau of the response. An alternative solution may be the use of PROC NLIN to model, as a continuous function, the outcome probabilities at each level of the explanatory variable. A dose-response case study with a limited number of treatment groups, including placebo, were illustrated, where alternative methods of modeling were better implemented in PROC NLIN. Utilization of SAS Logistic or alternative approaches requires thorough understanding of these procedures, the underlying methodology, data features, and the physiological meaning of variables. Keyword: Logistic regression, nonlinear, log transformation INTRODUCTION Exposure response relationship is very important in evaluating the efficacy and safety of a drug. The response, as a pharmacodynamic (PD) endpoint in both efficacy and safety studies for a drug, is frequently recorded as binary data. The exposure may refer to dose, drug or metabolite concentrations, or AUC (area under concentration-time profile) values. The exposure response relationship is frequently modeled using the method of logistic regression, such as SAS PROC LOGISTIC 1. In logistic regression, the basic model structure is illustrated as in Equation 1: p(y) = exp(y)/(1+exp(y)) (1) It assumes that the response probability (p) of a patient to y (a function of exposure, such as dose) is always from 0 to 1 in an S-shape pattern, as illustrated in Figure 1. This paper will discuss the application of the basic model defined by Equation 1 to pharmacodynamic data in different approaches. To simplify the description, a simulated case study was introduced in this paper. In this study, 400 patients were evenly grouped to take a daily dose of 0 (placebo), 2, 4 and 8 mg of a hypothetical drug. At the end of treatment, 12, 41, 41 and 54 patients (out of 100) in each group were observed to have response (yes) to a PD endpoint. That is to say, the population response probability was 0.12, 0.41, 0.41 and 0.54, respectively. Response probability p(y) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-15 -10-5 0 5 10 15 Figure 1. The shape of the curve of p(y) versus y in Equation 1. ORDINARY LOGISTIC REGRESSION y When Equation 1 is actually applied to pharamcodynamic endpoints of a drug, y is usually a function of exposure variables such as dose or concentrations or AUC values. In the case of ordinary logistic regression, y is assumed to be a linear function of exposure (x), as expressed in Equation 2. y = intercept + coeff*x (2)

Substitution of y in Equation 1 with Equation 2 leads to: p(x) = B*exp(coeff*x)/(1+B*exp(coeff*x)) (3) where B = exp(intercept). Ordinary logistic regression is therefore to obtain the estimates of intercept and coeff in Equation 2 (expressed as the logit function in SAS ) by fitting Equations 3 to the exposure response data. Obviously, Equation 1 represents a special case of Equation 3 with intercept=0 (thus B=1) and coeff=1. When coeff = 1 while intercept 0 (then B 1), the curve of p(x) vs. x is equivalent to shift the curve of p(y) vs. y with a distance of intercept to the right (if intercept<0) or to the left (if intercept>0) in Figure 1. Therefore, the curves of p(x) vs. x and p(y) vs. y in this case have exactly the same shape: S-shape, in a regular coordinate system. Generally, the curve of p(x) vs. x has a different steepness (determined by coeff) and x 50 (defined as the x value at which the response probability is 0.5 and determined by B and coeff) than the curve of p(y) vs. y even though both have the same S-shape. Note that, in practice where ordinary logistic regression applies, the S- shape does not necessarily appear in the exposureresponse graph if the measured exposure (x) range is beyond the inflection point (x i = -intercept/coeff), i.e., x x i, where response probability p(x i ) 0.5. of odds ratios from logistic regression with log transformation were investigated by Keen 2 and Elswick Jr. et al 3, respectively. Assuming: y = a + b*log(x) (4) where a, b, x are intercept, coefficient and exposure, e.g., dose, respectively. The probability function (Equation 1) becomes: p(x) = A*x b /(1+A*x b ) (5) where A = exp(a). Note that when x=c, b=γ and A=1/C 50 γ, this equation becomes the same equation as that for the concentration-response probability relationship used by Bailey and Gregg 4 for investigating the inter-patient variability (Probit regression). When ordinary logistic regression is applied to the case study, the parameters (as in Equation 2) are estimated as (mean±standard deviation): intercept = -1.271±0.178 and coeff = 0.200±0.037. The model predictions and measurements are illustrated in Figure 2. As is shown, the model-predictions of the response probability at placebo and the dosage of 2 mg are at least 0.1 off from the measurements (with a relative standard deviation up to 50% or 25%). In addition, the predicted S-shape dose-response relationship seems not to sufficiently agree with the measured C-shape relationship where measurements at all dose levels seem to be near or at the plateau of the curve. Since only four dose levels including placebo are available to develop a dose-response relationship and all three dose levels appear to be near or at the plateau of the response curve, it is worthwhile to explore other methods of developing the dose-response relationship. LOGISTIC REGRESSION WITH LOGARITHM TRANSFORMATION Log transformation expands logistic regression analysis from S-shape curves to C-shape curves and the interpretation of parameter estimates is different 2-3 - Characteristics of log transformation and interpretation Figure 2. Response probability versus dose for the case study. Pred_logo in the legend represents model predictions from ordinary logistic regression. In a regular coordinate system, the curve of p(x) vs. x expressed by Equation 5 is a C-shape curve when b 1 or an S-shape curve when b > 1. The maximum probability is 1 (when x infinity). Also note that, the S-shape of the curve may not appear in a graph of response probability vs. exposure if the exposure range (x) is beyond the inflection point (x i = [exp(-a)(b- 1)/(b+1)] 1/b ), i.e., x x i where the response probability p(x) (b-1)/(2b). 2

For the case study, since only four dose levels including placebo are used in the clinical trial, direct log transformation will result in discarding of the data point for placebo. Thus, only three dose levels were available for regression. The model inference based on these three dose levels would be misleading the predicted response probability is constant across the dose range (not shown in Figure 3), as indicated by the parameter estimates in the last row in Table 1. At least, the model-predictions of constant probability (0.45) should not be extrapolated to placebo. closer to the measurement the model-predicted probability at placebo. When the perturbation is less than 10-6, the model-predicted dose-response curves are visually indistinguishable, although their parameter estimates in logistic model are different. Table 1 lists parameter estimates of the logistic regression model after log transformation with different perturbation levels to dose values. Visually, the fitting by logistic regression with log transformation and perturbation is slightly better than that by ordinary logistic regression. However, the approximation for placebo using perturbation approach is tentative and might not be readily accepted. As a matter of fact, this approximation can actually be avoided if Equation 5 is directly used to fit the data, instead of utilizing log transformation in logistic regression (Equation 1 plus 4). This nonlinear modeling can be implemented with SAS PROC NLIN, which is extensively applied in engineering. Table 1. The Parameter Estimates for A Logistic Model after Log Transformation (refer to Equation 4) Figure 3. Dose-response relationship fit by logistic regression with log transformation of dose. Pred_log2, Pred_log4 and Pred_log6 are model-predictions under 3 different perturbation levels: dose = dose+0.01, dose+0.0001, or dose+0.000001, respectively, in order to implement the log transformation on placebo. Perturbation level to doses a 0.01-0.610 (0.115) 0.0001-0.424 0.000001-0.351 Placebo -0.187 excluded (0.116) b 0.306 (0.053) 0.175 (0.031) 0.122 (0.022) 0 Therefore, when all response measurements are near or at plateau of the exposure-response curve, the data point that represents placebo is critical for analysis and should be included. To do this, a small perturbation to doses for log transformation might be helpful. For example, 0.000001 can be added to all dose levels, including placebo. Thus, the value of log transformation of placebo will be 13.8 (natural logarithm) while the values of log transformation of dose 2, 4, and 8 are actually not changed. When this strategy is used, the model predictions are reasonably consistent with measurements. Figure 3 demonstrates the fit of the model (Equation 5) with logistic regression after log transformation (Equation 4) with different levels of perturbation on doses. Stepwise selection of covariates was used with entry criterion of p=0.05 and elimination criterion of p=0.01. Generally, the smaller the perturbation, the PROC NLIN PROCEDURE When the regression model or the shape of the exposureresponse curve is known, SAS PROC NLIN is a good option for nonlinear models, including those for PD data. Actually, the dose-response curve shape for the case study can be described with the following general model structure: p(x) = α + β*x γ /(δ+ x γ ) (6) Equation 5 represents just a special case of Equation 6 when α=0, β=1, γ=b and δ=a -1. Similar to Equation 5, Equation 6 also represents 2 different shapes of curves: C-shape when γ 1 and S-shape otherwise, with the inflection point at x i = [β(γ-1)/(1+γ)] 1/γ. Therefore, 3

exposure-response models, which can be developed through PROC LOGISTIC with log transformation, can be potentially developed through PROC NLIN without any transformation. In addition, PROC NLIN can work on more types of datasets. One advantage of using PROC NLIN is that placebo data can be directly included for analysis without any approximation treatment. model. However, this model means that the maximum probability is 0.6. Theoretically, it does not make sense because for whatever drug, when the drug dose goes to infinity, the response probability should approach 1 (for either efficacy or safety). In practice, it could be true that in a certain range of dosages the response probability keeps under certain value. Whether this is really true or not is confirmable from measurements of additional expanded dose levels or prior information from fundamental studies. When the maximum probability is constrained to 1, the fitting is slightly worse, but it is at least comparable to, if not better than, that through logistic regression with log transformation. As a comparison, Equation 5 was directly used to fit the data using PROC NLIN and the model-prediction is shown in Figure 4 by the legend Pred_nlin4. The parameter estimates are (mean±standard deviation): A=0.486±0.162 and b=0.384±0.220 (refer to Equation 5). Obviously, this fit is much better than that using logistic regression with log transformation although the models are equivalent. Table 2. Parameter Estimates of the Model Expressed by Equation 6 via SAS PROC NLIN Procedure Figure 4. Model predictions of exposure-response relationship (Equation 6) with γ=1 under different conditions: Pred_nlin1 all three parameters α, β and δ are estimated from fitting; Pred_nlin2 α and δ are estimated from fitting while β is fixed as 1; Pred_nlin3 only δ is estimated from fitting while α is fixed as 0.12, referring to the measured response probability for the placebo, and β is fixed as 1. Pred_nlin4 model (Equation 5) fitting using PROC NLIN. Modeling α, β Nlin1 0.122 (0.046) 0.475 Nlin2 0.149 1 α (0.070) fixed Nlin3 0.12 fixed 1 - α fixed δ 1.644 (1.272) 7.95 (3.54) 9.02 (1.75) For the case study, a reduced function of Equation 6 with γ=1was tried. Different from logistic regression which directly uses raw binary data (0 or 1) for the response variable (dichotomous), the response variable in nonlinear models are continuous whose values are calculated subpopulation probabilities (from 0 to 1) at each exposure level and/or covariate group. Since no covariate is identified significant for this particular case study during logistic regression, the response probability is therefore simply calculated from the subpopulation at each dose level. The fittings by the model via PROC NLIN under 3 conditions are illustrated in Figure 4 while their parameter estimates are listed in Table 2. As expected, when all parameters are obtained from regression, the fitting is the best (refer to Pred_nlin1 in Figure 4) since more parameters are included in the MODEL/METHOD SELECTION As discussed previously, for the case study in this paper, there are at least three potential approaches to model the limited data: ordinary logistic regression, logistic regression with log transformation, and nonlinear model fitting. Each of them has advantages and disadvantages. The quality of fitting is also different, as demonstrated in Figure 5. If the measurements are representative and reliable, i.e. close to the true values, the model with the best prediction is the best. For this particular case study, general nonlinear model through PROC NLIN is superior to logistic regressions since the model (refer to Prednlin1 in Figure 5) is best fitting to the data. 4

REFERENCE 1. SAS Institute Inc. SAS/STAT User s Guide, Volume 2. Version 6, 4 th edition. Cary, NC: SAS Institute Inc., 1994. 2. N Keene. The Log Transformation Is Special. Statistics in Medicine, Vol. 14, 811-819 (1995). 3. R K Elswick, Jr, P F Schwartz and J A Welsh. Interpretation of the Odds Ratio from Logistic Regression after A Transformation of the Covariate Vector. Statistics in Medicine, Vol. 16, 1695-1703 (1997). 4. J M Bailey and K M Gregg. A Technique for Population Pharmacodynamic Analysis of Concentration-Binary Response Data. Anaesthesiology1997; 86:825-35. Figure 5. Comparison of the quality of fitting for three different approaches: Pred_nlin1 totally free nonlinear model (Equation 6 with γ=1) fitting using PROC NLIN; Pred_nlin4 nonlinear model (Equation 5) fitting via PROC NLIN; Pred_logo ordinary logistic regression via PROC LOGISTIC; and Pred_log6 logistic regression after log transformation on doses plus approximating placebo with a dosage of 0.000001. SUMMARY Logistic regression is a powerful tool widely used to perform PD data analysis. However, its applicability is limited by its strict assumptions inherited in the model structure. Although log transformation can expand the application of logistic regression, the transformation process itself might restrict this expansion when placebo data has to be included for analysis. Nonlinear modeling through PROC NLIN is generally a more flexible and powerful approach. However, prior information about the model structure is required and whether PROC NLIN is successful or not sometime depends on the model structure and appropriateness of the initial guesses for parameter estimates. Exploratory graphs of exposure response probability should be helpful to select the primary methods and models and logistic regression could be a convenient primary option. However, when logistic regression cannot work very well, alternative methods should be explored. The selection of the final model should be based on the combined information about the characteristics of data, quality of fitting and physiological rationale. SAS and all other SAS Institute Inc. product r service names are registered trademarks or trade marks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. DISCLAIMER This presentation only reflects my current personal thinking on potential alternative approaches to PD data analysis when logistic regression as a primary option cannot work well, based on my experience in engineering area. So far, none of these approaches has yet been applied to any real projects. No material in this presentation is from Cognigen Corporation. Under no circumstances should this presentation be related to the position that Cognigen Corporation takes on PD data analysis. CONTACT INFORMATION The author can be contacted at: Alan J Xiao, Ph.D. Population PK/PD Scientist Cognigen Corporation 395 Youngs Road Buffalo, NY 14221-5831 Phone: 716-633-3463 ext. 265 Fax: 716-633-7404 Email: alan.xiao@cognigencorp.com Web: www.cognigencorp.com 5