Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Size: px

Start display at page:

Download "Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto."

Angel Cynthia Terry
5 years ago
Views:

1 Introduction to Dalla Lana School of Public Health University of Toronto September 18,

2 : a review 38-2

3 Evidence Ideal: to advance the knowledge-base of clinical medicine, clinical research produces evidence. The totality of the available evidence eventually leads to knowledge through induction (including statistical inference). In particular, based on induction, nothing is ever concluded, let alone based on a single study, which can only contribute to the totality of evidence (Miettinen 2011, p. 36). The knowledge in turn serves as an input to decision making (Miettinen & Karp 2012, p. 131). Reality: sometimes decisions have to be made based on limited statistical evidence. 38-3

4 38-4 Object of inference Evidence can be reported for instance in the form of a p-value or a confidence interval. But let s not get ahead of ourselves. Evidence about what? Evidence about the object of inference. Terms that we will be using interchangeably with object are parameter or estimand, in the meaning (Miettinen 2011, p. 60) Parameter - A constant (of unknown magnitude) in a (statistical) model. Such parameters are not only unobserved, but unobservable, since statistical are themselves theoretical constructs. However, based on observed data, we can arrive at an estimate of the value/magnitude of the unknown parameter.

5 38-5 Rather than making conclusions, by using inferential, we aim to quantify the uncertainty about the objects of inference. By, we don t mean here the science of, but the plural of statistic, that is, (Miettinen 2011, p. 63) Statistic - A number derived from a sample. Statistics can be descriptive or inferential, the latter being (Miettinen 2011, p. 57) statistic - A statistic derived under a statistical model. Examples: the estimator ˆλ = D y for rate parameter λ, or the estimator ˆπ = D n for risk parameter π. These are inferential since they are derived as maximum likelihood estimators under Poisson and binomial, respectively.

6 Measures of imprecision We already noted that a point estimator such as ˆλ is not in itself sufficient for reporting statistical evidence; we need a measure of imprecision for the point estimate. Unlike a point estimate, a measure of imprecision does not directly characterize the object of study; instead, it characterizes the study itself, for instance, the size of the study (Miettinen 2011, p. 43). A measure of imprecision that we will be first concerned with is the standard error. To understand what a standard error is, we have to begin with the sampling distribution of a statistic. 38-6

7 Sampling distribution of the empirical risk In the absence of any other individual level information, the statistical model for the number of incident events in a population of size n within a specified time interval is D Binomial(n, π), where π is estimated by ˆπ = D n. Imagine now that a study involving the recruitment of n individuals and the observation of their event outcomes is carried out many times and ˆπ is calculated from each study. The distribution of the resulting estimates is the sampling distribution of ˆπ. Let π = 0.09 and see how this looks with different n. 38-7

8 n = 25 Frequency π^

9 n = 100 Frequency π^

10 n = 1000 Frequency π^

11 A certain shape appears Frequency π^

12 Standard error With large n, the sampling distribution for ˆπ is approximately ( ) π(1 π) ˆπ N π,, n that is, the normal distribution with the expectation π and π(1 π) standard deviation n. The standard deviation of a sampling distribution of a statistic is known as standard error. We will henceforth use this term also for an estimate of the standard error, in the present example given by ˆπ(1 ˆπ) S =. n 38-12

13 Back to sampling distributions Note that the normal distribution appeared even though the original observations were very non-normal (the binomial distribution arises as the sum of independent Bernoulli variables). However, by the central limit theorem the mean of independent outcomes is normally distributed irrespective of their original distribution. It applies to all maximum likelihood estimators ˆθ that with large n the sampling distribution is approximately ˆθ N(θ, S 2 ). The standard error S depends on various factors, in particular the sample size n

14 Estimating the log-rate ratio Even though the normal sampling distribution always appears with large enough n, for non-negative, such as empirical rate ratios, the normal approximation is better if we take an appropriate transformation of the parameter and the corresponding estimator. For non-negative this transformation is usually the logarithm. If the aggregate person-years by exposure status are known, we ( may) estimate the log rate ratio log θ = log λ1 λ 0 by ( ) D1 /y 1 log ˆθ = log. D 0 /y

15 example The data below is from a randomized trial studying the role of male circumcision for HIV prevention in Uganda. The events are incident HIV infections during the first 24 months of follow-up since the randomization and the intervention is male circumcision. Intervention group Control group Participants Incident events Person-years

16 Standard error of the log-rate ratio Now we have that log ˆθ N(log θ, S 2 ), where S = 1 D D 0. This depends only on the numbers of exposed and unexposed cases; the larger these are, the smaller the standard error. The standardized statistic is approximately distributed as Z = log ˆθ log θ 1 D1 + 1 D0 N(0, 1). This could tell us whether the observed value of log ˆθ is somehow unusual compared to the true value

17 Test statistic and p-value Under the null hypothesis H 0 : log θ = 0, the z-value is Z = log ˆθ D1 D0 We have constructed a test statistic. It will take large positive or negative values when the null hypothesis is not true. This unusualness is quantified by the p-value p = P( Z > z H 0 ), where z is the observed value of the Z-test statistic. Alternatively, the evidence may be reported in the form of a confidence interval

18 38-18 Confidence interval With the log-rate ( ) ratio parameter and the statistic log ˆθ = log D1 /y 1 D 0 /y 0 we have that ( ) P log ˆθ 1.96 S log θ log ˆθ S = 0.95, where S = 1 D D 0. This gives a 95% confidence interval for the log-rate ratio. To get a CI for the rate ratio itself, we note that ( P log ˆθ 1.96 S log θ log ˆθ ) S ) = P (e log ˆθ 1.96 S e log θ log ˆθ+1.96 S e = P (ˆθ e 1.96 S θ ˆθ e 1.96 S) = Thus, [ˆθ e 1.96 S, ˆθ ] e 1.96 S is a 95% CI for the rate ratio parameter.

19 Interpretation Clayton & Hills (1993, p. 91): 38-19

20 38-20

21 38-21 Decision making If we are only reporting evidence, our role is not to reject anything. Rejecting or not rejecting hypotheses is left for the scientific community, based on the totality of the available evidence. However, if we indeed have to make a decision (note again the difference between a decision and a conclusion), we have to weigh the costs of making the wrong decision. There are two different kinds of errors in hypothesis based decision making. We can either reject the null, when it is in fact correct (type I error, or false positive), or fail to reject the null when it in fact should be rejected (type II error, or false negative). There is no free lunch in ; there is a tradeoff between minimizing the false positive probability and minimizing the false negative probability.

22 A hypothesis problem example of a decision problem: should one treat future patients with a new procedure or drug based on limited statistical evidence on its efficacy? Consider the following example: the drug AZT was the first drug that seemed effective in delaying the onset of AIDS of HIV-positive patients. In a randomized study 435 HIV-positive subjects were assigned to take 500 milligrams of AZT and another 435 HIV-positive subjects were assigned to take a placebo. If the approval of the drug is made based on the evidence produced by this trial, we are involved in a decision problem, rather than just reporting statistical evidence. Let the risk of AIDS onset be π 1 with the treatment and π 0 without treatment. The null hypothesis could be formulated in terms of the risk ratio as H 0 : θ = π 1 π 0 =

23 Significance level For the purpose of making a decision, we choose a fixed significance level α, and reject H 0 if p < α. On the other hand, p H 0 U(0, 1). This follows because 1 p = P( Z < z), where P( Z < z) is the cumulative distribution function of Z, and the uniform distribution is preserved in a linear transformation. Thus, P(p < α H 0 ) = α, and the significance level is the probability of type I error. Note also that P(p < α H 1 ) > α (why?)

24 Power of a test Choosing a small α reduces the probability of a false positive result. However, unfortunately it also increases the probability of a false negative result, denoted as β = P(p α H 1 ). This means that avoiding a false positive result makes it more difficult to reject the null when it in fact should be rejected. In other words, a small type I error probability reduces the power of the test. Power of a test is the probability 1 β, the probability to reject the null when it should be rejected. As we will see later, in addition to the significance level, the power of a test depends on the sample size and effect size

25 Decision table The four possible decisions and the corresponding probabilities may be expressed in a 2 2-table as or Decision H 0 : θ = 1 H 1 : θ 1 p < α P(p < α H 0 ) P(p < α H 1 ) p α P(p α H 0 ) P(p α H 1 ) 1 1 Decision H 0 : θ = 1 H 1 : θ 1 p < α α 1 β p α 1 α β

26 Choosing the significance level Knowing that there is a tradeoff between the false positive and false negative probabilities, what would the appropriate α be? swer: it depends. In particular, it depends on the respective costs and benefits associated with the decisions. For instance, in a case of a false positive, the patient might be treated with a procedure or a drug that is not effective, but might be costly or have side-effects. In a case of a false negative, the patient is not treated, while in fact the treatment might have helped; possibly a fatal decision. Weighting such harms and benefits is highly subjective, and outside the realm of

27 38-27

28 2 2-table for conditional probabilities Recall the AZT randomized trial example: AIDS = 1 AIDS = 0 AZT = AZT = In terms of conditional probabilities, the 2 2-table may be presented as Y = 1 Y = 0 Z = 1 P(Y = 1 Z = 1) P(Y = 0 Z = 1) 1 Z = 0 P(Y = 1 Z = 0) P(Y = 0 Z = 0) 1 Or in terms of risk parameters as Y = 1 Y = 0 Z = 1 π 1 1 π 1 1 Z = 0 π 0 1 π

29 Reparametrization The problem could equally well be parametrized in terms of odds: Z = 1 Z = 0 Or in terms of log-odds: Y = 1 Y = 0 π 1 1 π 1 π 0 1 π 1 π 1 1 π 0 1 π 0 π 0 Y = 1 Y = 0 Z = 1 log π 1 1 π 1 log 1 π 1 π 1 Z = 0 log π 0 1 π 0 log 1 π 0 π 0 Neither are of direct interest to us, since the objective is to compare the risk of AIDS onset between the two groups

30 A more relevant reparametrization Redefine the four log-odds as: Y = 1 Y = 0 Z = 1 α + β (α + β) Z = 0 α α (These α and β are unrelated to the previous ones.) This corresponds to a regression equation log π Z 1 π Z = α + βz, where log Here π 1 1 π 1 π 0 1 π 0 π Z 1 π Z = eα+β e α is known as the logit transformation. = eα e β e α = eβ log ( π1 ) 1 π 1 π 0 = β. 1 π The regression coefficient β is a log-odds ratio.

31 Deterministic and stochastic model components The regression equation specifies the deterministic part of the model. To complete the model specification, we need to specify the stochastic component of the model, that is, a statistical distribution for the outcomes D 1 and D 0. The appropriate distribution is D Z Binomial(n Z, π Z ). Here the risk π Z is given by the regression equation as π Z = eα+βz 1 + e α+βz = e (α+βz). This inverse transformation is the so-called expit function: π Z = logit 1 (α + βz) = expit(α + βz)

32 Regression model Clayton & Hills (1993, p. 217): A common theme in all these situations is change from the original parameters to new parameters which are more relevant to the comparisons of interest. This change can be described by the equations which express the old parameters in terms of the new parameters. These equations are referred to as regression equations, and the statistical model is called a regression model. Now the old parameters are the two log-odds log π Z 1 π Z

33 Estimate the model parameters The parameters α and β can be estimated using maximum likelihood. Read in the data: d <- c(17,38) n <- c(435,435) z <- c(1,0) Fit the model: model <- glm(cbind(d,n-d) ~ z, family=binomial(link="logit")) Check the results: summary(model) 38-33

34 Logistic model results Call: glm(formula = cbind(d, n - d) ~ z, family = binomial(link = "logit")) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** z ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-14 on 0 degrees of freedom AIC: Number of Fisher Scoring iterations:

35 38-35 Log-linear model for risk Is there some particular reason why we have to use the logit link when modeling risk? Why could we not just parametrize the log-risk as log(π Z ) = α + βz? We can; in this case the regression coefficient β would be interpreted as a log-risk ratio: π 1 = eα+β π 0 e α = eα e β ( ) e α = π1 eβ log = β. However, note that there is nothing here bounding the risk to values below one, which might cause numerical problems. The log-linear model does bound the risk to non-negative values, so as long as the risk is small, log-linear and logistic regression give similar results. π 0

36 Log-linear model results Call: glm(formula = cbind(d, n - d) ~ z, family = binomial(link = "log")) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** z ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-14 on 0 degrees of freedom AIC: Number of Fisher Scoring iterations:

37 Disclaimer about The objective of a study is not to study something. In the same vein, modeling, including model selection, and checking, or, the correctness of a model, should not be an end in itself. In particular in clinical trials, we are only interested in a specific parameter in the model. By definition, there is no such thing as a correct model; a model is a simplification of reality, not the reality itself. Hence, a model need not capture all features of reality; if it could, it would no longer be a model. A good model is a model that is useful by serving some purpose. Some may be better than others, depending on the chosen criterion for better

38 References Clayton, D. and Hills, M. (2011). Models in Epidemiology. Oxford University Press. Miettinen, O. S. (2011). Epidemiological Research: Terms and Concepts. Springer, Dordrecht. Miettinen, O. S. and Karp, I. (2012). Epidemiological Research: Introduction. Springer, Dordrecht

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really