Dose-response modeling with bivariate binary data under model uncertainty

Similar documents
Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Introduction to Statistical Analysis

Generalizing the MCPMod methodology beyond normal, independent data

On Generalized Fixed Sequence Procedures for Controlling the FWER

Multivariate Extensions of McNemar s Test

Pubh 8482: Sequential Analysis

Optimal rejection regions for testing multiple binary endpoints in small samples

Generalizing the MCPMod methodology beyond normal, independent data

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

A Clinical Trial Simulation System, Its Applications, and Future Challenges. Peter Westfall, Texas Tech University Kuenhi Tsai, Merck Research Lab

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Testing Statistical Hypotheses

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Stat 5101 Lecture Notes

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

BMA-Mod : A Bayesian Model Averaging Strategy for Determining Dose-Response Relationships in the Presence of Model Uncertainty

Adaptive Designs: Why, How and When?

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Testing Independence

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Multivariate Survival Analysis

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Categorical data analysis Chapter 5

Sample Size Determination

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Testing Statistical Hypotheses

Two-stage k-sample designs for the ordered alternative problem

hypothesis a claim about the value of some parameter (like p)

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Chapter 1 Statistical Inference

Bayesian concept for combined Phase 2a/b trials

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

6.867 Machine Learning

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Sigmaplot di Systat Software

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Optimal exact tests for multiple binary endpoints

Statistics 3858 : Contingency Tables

Analysis of variance, multivariate (MANOVA)

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

BIOS 312: Precision of Statistical Inference

Reports of the Institute of Biostatistics

Sample Size Estimation for Studies of High-Dimensional Data

Machine Learning 2017

Control of Directional Errors in Fixed Sequence Multiple Testing

Stepwise Gatekeeping Procedures in Clinical Trial Applications

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Fractional Polynomials and Model Averaging

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Review of Statistics 101

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Sociology 6Z03 Review II

An Introduction to Multivariate Statistical Analysis

PSI Journal Club March 10 th, 2016

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Subject CS1 Actuarial Statistics 1 Core Principles

Logistic Regression: Regression with a Binary Dependent Variable

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

COMPARING SEVERAL MEANS: ANOVA

Chapter 9 Inferences from Two Samples

QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost

Confidence Intervals, Testing and ANOVA Summary

Machine Learning Linear Classification. Prof. Matteo Matteucci

Causal Model Selection Hypothesis Tests in Systems Genetics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Group Sequential Designs: Theory, Computation and Optimisation

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Inferential statistics

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

A simulation study for comparing testing statistics in response-adaptive randomization

STAT 525 Fall Final exam. Tuesday December 14, 2010

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

HANDBOOK OF APPLICABLE MATHEMATICS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Model Estimation Example

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Multimodel Inference: Understanding AIC relative variable importance values. Kenneth P. Burnham Colorado State University Fort Collins, Colorado 80523

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Unit 9: Inferences for Proportions and Count Data

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

Downloaded from:

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

Transcription:

Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics, Graz University of Technology, Steyrergasse 17/IV, 8010 Graz, Austria. E-mail: bklingen@williams.edu Abstract: When modeling a dose-response for a drug based on bivariate binary data such as two co-primary efficacy endpoints in early stages of development, there is usually uncertainty about the from of the true underlying dose-response shape. Often, investigators fit several different models that are deemed plausible, but later fail to acknowledge this uncertainty in inference that is based on a single model selected via e.g. the minimum AIC criterion. This leads to an inflation in the error of the proof of activity decision and may also result in poor estimation of a target dose that is used in future trials. In this article we acknowledge model uncertainty by fitting several candidate models for a bivariate binary response and develop a principled approach to establish proof of activity and a target dose. Keywords: Dose estimation; Multiplicity adjustment; Proof of Concept. 1 Introduction Dose-response studies are important tools for investigating the existence, nature and extent of a dose effect on efficacy or safety outcomes in drug development, toxicology and related areas. The following four questions, usually in this order, are of prime interest: i.) Is there any evidence of a dose effect (i.e., Proof of Activity), ii.) Which doses exhibit a response different from the control response, iii.) What is the nature of the doseresponse relationship and iv.) What dose should be selected for further studies/marketing (i.e., target dose estimation)? Here we suggests to answer questions i) to iv) in a unified framework using statistical modelling. To meet the criticism that results based on modelling depend too much on assumptions on the underlying dose-response shape, we incorporate model uncertainty into our methods of establishing Proof of Activity (PoA) and estimating a target dose. We do this by considering several candidate models for the true dose-response which we simultaneously use for inference. Recently, Klingenberg (2008) has shown via simulation that for univariate binary dose-response data from simple parallel group designs, common methods of establishing PoA with a model based approach (i.e., basing the PoA decision on the P-value of a test using the model with the smallest AIC

2 Dose-response for bivariate binary data under model uncertainty among a set of candidate models) lead to inflated type I errors. This means that the probability of carrying forward an ineffective drug into Phase III is not well controlled. Motivated by the example below, in this article we consider bivariate binary response data (Y 1j, Y 2j ), where Y ij is the binary indicator of efficacy (or safety) for endpoint i (i=1,2) for subjects randomized to one out of k dose levels d j, j = 1 (Placebo),... k. We develop a principled approach to both, establishing PoA and estimating a target dose (such as the minimum effective dose, MED) under model uncertainty, controlling the type I error of an incorrect PoA decision. 2 Proof of Activity under model uncertainty We assume independent multinomial distributions Mult[n j ; π 00 (d j ), π 01 (d j ), π 10 (d j ), π 11 (d j )] for the counts in each of the j = 1,..., k 2 2 tables that crossclassify the bivariate response at dose level d j. Here, n j is the number of subjects randomized to dose level d j and π ab (d j ) = P dj (Y 1j = a, Y 2j = b), a, b 0, 1 is the probability of response (a, b) at dose level d j. From now on, we assume that Y 1j and Y 2j are both measuring efficacy, for instance when there are two co-primary endpoints in a clinical trial (see example below). One simple approach for establishing PoA and estimating the MED would be to collapse the two efficacy endpoints into a single one, recording an overall success if both endpoint show efficacy, and failure otherwise. However, one drawback of this method is that it declares outcomes of type (0, 1) and (1, 0) as failures, which might lead to a loss of power in establishing PoA and too high an estimate for the MED. In a multivariate approach, we model separately the marginal success probabilities π 1+ (d j ) and π +1 (d j ) for each endpoint in terms of dose, taking into account the dependence between them by modeling the log-odds (Palmgren, 1989). Let M s denote a specific constellation of two marginal models and one model for the association in (Y 1j, Y 2j ). To accommodate model uncertainty in the PoA decision and subsequent target dose estimation, we start by considering several plausible dose-response models M s, s = 1,..., m that differ in the way they model the two margins and/or the association. Let M = {M 1,..., M m } be a candidate set spanned by m such models. Since target doses such as the MED depend on the assumed shapes for each margin, considering several shapes a priori makes the procedure more robust to model misspecification. To decide which of the candidate models, if any, significantly pick up the dose-response signal, we compare each one to the no-effect model via a signed and penalized likelihood ratio statistic T s = ± { 2 [l 0 l s ]} 2df s, where l s is the maximized multinomial log-likelihood under candidate model M s, and M 0 and l 0, respectively correspond to the no dose effect model

B. Klingenberg 3 π 1+ (d j ) = α 1, π +1 (d j ) = α 2, OR j = π00(dj)π11(dj) π 01 (d j )π 10 (d j ) = α 3 that assumes constant margins and odds ratios across all dose levels. Naturally, we are only interested in models that show a positive dose effect. Although straightforward for monotone marginal profiles, in general we define a dose effect as positive if ˆπ(d max ) > ˆπ(d 1 ), where d max = argmax d ˆπ(d) ˆπ(d 1 ) is the dose at which the maximum absolute effect relative to placebo occurs. This condition must be met for both, the first (ˆπ ˆπ 1+ ) and second (ˆπ ˆπ +1 ) margin, otherwise we declare the dose effect as negative. The purpose of the ± sign in T s is then to give models with an estimated (but potentially significant) negative dose effect a small value of the test statistic, moving it to the lower tail. Finally, we penalize fitting more complex models by subtracting two times the difference in the number of parameters between M s and M 0. Up to the ± sign, T s is equivalent to the differences in the AICs of M 0 and M s, a statistic favored for model selection by Burnham and Anderson (2002). Under the null hypothesis of no dose effect, bivariate responses (Y 1j, Y 2j ) are exchangeable among the k dose levels. To evaluate the significance of T s under simultaneous inference with all m candidate models, we build the permutation distribution of its maximum, max s T s, by fitting each M s to a random sample of all possible assignments of the (Y 1j, Y 2j ) responses to dose levels. (Although the asymptotic distribution of T s is proportional to a Chi-square, the distribution of the maximum is not straightforward to derive due to correlation between the test statistics.) The permutation distribution yields raw and, using the full closed-testing methodology or the step-down approach in Westfall and Young (1993), multiplicity adjusted P- values for the PoA test with each candidate model. These adjusted P-values now appropriately account for the uncertainty in the dose-response shapes and control the familywise error rate of a wrong PoA decision under the family of candidate dose-response models. This control is in the strong sense, that is, under any combination of true and false null hypotheses, where the s-th null hypothesis concerns the testing of no dose effect under model M s. 3 MED estimation After establishing PoA with at least one candidate model (smallest multiplicity adjusted P-value less than the chosen overall type I error rate), the procedure moves to estimating the MED. For a given model, the MED is the smallest dose that shows a clinical relevant and statistical significant improvement over placebo, i.e., MED = argmin d (d1,d k ]{ˆπ(d) > ˆπ(d 1 ) +, ˆπ L (d) > ˆπ(d 1 )}, (1) where is the clinically relevant effect (may differ by margins) and ˆπ L (d) is the lower limit of a 100(1 γ/2) confidence interval for π(d). We say

4 Dose-response for bivariate binary data under model uncertainty L1 L2 L3 L4 L5 L6 efficacy 0.5 0.4 0.3 0.2 dose FIGURE 1. Some plausible dose-response models for margins of the IBS compound that the MED does not exist if there is no d (d 1, d k ] for which the conditions are satisfied. There are several possibilities of defining the MED with bivariate data. Here, we estimate the MED from each of the two margins (i.e., with ˆπ ˆπ 1+ and ˆπ ˆπ +1 ) and take the maximum over the two MED estimates as the MED estimate for the model, so that clinical relevance and statistical significance is guaranteed for both margins. A different approach would derive the MED by setting ˆπ ˆπ 11. We can take the MED that corresponds to the model with the smallest adjusted P-value as the overall estimate for the target dose. However, a MED estimator that incorporates model uncertainty is based on a weighted average of the MEDs from each candidate model (so they exist), wmed = s w MED / s s s w s, with weights w s = exp(t s /2). In this way, the ratio of weights w s and w s attached to the MED from two candidate models M s and M s reflects their (penalized) likelihood distance. 4 Example: PoA and MED for a diarrhea compound Guidelines from the European Agency for the Evaluation of Medicinal Products for proving efficacy for compounds against Irritable Bowl Syndrome (IBS) demand that two endpoints, relief of abdominal pain and relief from overall GI-tract symptoms be considered jointly. Figure 1 and Table 1 show several shapes for the marginal probability of each of this endpoints (plotted using information such as the expected placebo and maximum dose effect from prior studies) that are deemed plausible by the clinical team developing a compound against IBS at dose levels d = (0, 1, 4, 12, 24)mg. The different shapes of these models are generated with linear predictors of fractional polynomial form (Roystone and Altman, 1994) that allow for a broad dose-response space. For instance, the clinical team was uncertain about the rate of increase in both margins at low doses and about the mono-

B. Klingenberg 5 TABLE 1. Various shapes for the marginal efficacy of the IBS compound Shape Linear Predictor Shape Linear Predictor L 0 : α L 4 : α + β exp(exp(d j /max(d j ))) L 1 : α + βd j L 5 : α + βd j + γd 2 j L 2 : α + β log(d j + 1) L 6 : α + β log(d j + 1) + γ/(d j + 1) L 3 : α + β/(d j + 1) TABLE 2. Candidate dose-response models for the efficacy of the IBS compound. The triplets in the first column refer to which linear predictor is used to model the two margins logit[π 1+ (d j )] and logit[π +1 (d j )] and the log-odds ratio log(or j ). candidate perm. adj. MED weight model AIC T P-value P-value (mg) (%) M 0 : (L 0, L 0, L 0 ) 900.1 M 1 : (L 1, L 2, L 0 ) 898.2 1.92 0.4196 0.5875 15.9 4.3 M 2 : (L 2, L 2, L 0 ) 894.3 5.80 0.0031 0.0101 6.0 29.8 M 3 : (L 3, L 2, L 0 ) 895.6 4.55 0.0060 0.0218 9.6 15.9 M 4 : (L 4, L 2, L 0 ) 900.7-0.55 0.6408 0.7593 NA 1.2 M 5 : (L 1, L 6, L 0 ) 900.0 0.08 0.6212 0.7593 15.4 1.7 M 6 : (L 2, L 6, L 0 ) 896.1 4.03 0.2682 0.4780 4.0 12.3 M 7 : (L 3, L 6, L 0 ) 894.0 6.08 0.0030 0.0101 1.0 34.3 M 8 : (L 4, L 6, L 0 ) 902.6-2.45 0.7631 0.7631 NA 0.5 tonicity of the dose-response curve for the second margin. Hence, the candidate set should include models that incorporate various scenarios for the slope or non-monotonicity. A potential candidate set for the IBS compound is shown in Table 2. For instance, consider model M 6 which postulates logit[π 1+ (d j )] = α 1 + β 1 log(d j + 1), logit[π +1 (d j )] = α 2 + β 2 log(d j + 1) + γ 2 /(d j + 1), log(or j ) = log (π 00 (d j )π 11 (d j )/π 01 (d j )π 10 (d j )) = α 3. When fitting M 6 to (slightly modified) clinical trials data on the efficacy of the IBS compound, a test of no dose effect compares this model to the noeffect model with β 1 = β 2 = γ 2 = 0. Its penalized likelihood ratio statistic T 6 = + { 2 [ 447.06 ( 442.04)]} 2 3 = 4.03. However, the maximum value of 6.08 for the test statistic over all candidate models occurs under model M 7 that allows for a steeper rate of increase in the marginal odd of abdominal pain. This model would also be considered for inference (testing PoA and estimating the MED) when the decision is based on the minimum AIC criterion. The likelihood ratio PoA test with this model has an asymptotic P-value of 0.0035, very similar to the raw permutation P-value displayed in Table 2 that shows the proportion of all 10,000 permutations that yielded a T 7 value as large or larger than the observe one. However, this

6 Dose-response for bivariate binary data under model uncertainty statement of significance is conditional on the selected model and ignores the uncertainty the clinical team had at the start of the trial, as expressed in the candidate set. Under simultaneous inference with all candidate models, i.e., testing PoA under each model, the multiplicity adjusted P-value for the PoA test in model M 7 equals 0.0101, which is about three times larger but still significant at a conventional overall type I error rate of 2.5%, say. This multiplicity adjusted P-value is derived from the closed testing framework using the permutation distribution of max s T s. The MED derived from M 7 (with a clinical relevant effect of = 10% for both margins and γ = 5%) equals 1.0mg. Incorporating model uncertainty, the weighted estimate wmed = 5.2mg uses the MED estimates from all candidate models with weights displayed in Table 2 and may be more appropriate. 5 Discussion The use of multiplicity adjusted P-values guarantees that the type I error rate for one or more incorrect PoC decision (based on various candidate models) when the drug is actually ineffective is controlled at the desired level (e.g., 2.5%). This is in contrast to the common habit of model fishing or gambling on an a priori specified shape in the study protocol. A disadvantage of our method is that non-linear (on the link scale) dose response models are harder to incorporate because they would not provide convergent fits for many permutations. Here we treated the case where both variables describe efficacy. Equally interesting is the case where Y 1j is a binary primary efficacy variable and Y 2j a binary variable describing safety at dose level j. References Burnham, K., and Anderson, D.R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York: Springer. Klingenberg, B. (2008). Proof of Concept and dose estimation with binary responses. Under second review by Statistics in Medicine. Palmgren, J. (1989). Regression models for bivariate binary responses. Tech. Rep. 101, Department of Biostatistics, University of Washington, Seattle. Roystone, P., and Altman, D.G. (1994). Regression using fractional polynomials of continues covariates: parsimonious parametric modelling. Applied Statistics, 43, 429 467. Westfall, P., and Young, S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-value Adjustment. New York: John Wiley & Sons.