Additive and multiplicative models for the joint effect of two risk factors

Size: px

Start display at page:

Download "Additive and multiplicative models for the joint effect of two risk factors"

Vivian Morton
5 years ago
Views:

1 Biostatistics (2005), 6, 1,pp. 1 9 doi: /biostatistics/kxh024 Additive and multiplicative models for the joint effect of two risk factors A. BERRINGTON DE GONZÁLEZ Cancer Research UK Epidemiology Unit, University of Oxford, Gibson Building, Radcliffe Infirmary, Oxford, OX2 6HE, UK amy.berrington@cancer.org.uk D. R. COX Nuffield College, University of Oxford, OX1 1NF, UK david.cox@nuffield.ox.ac.uk SUMMARY Simple tests are given for consistency of the data with additive and with multiplicative effects of two risk factors on a binary outcome. A combination of the procedures will show whether data are consistent with neither, one or both of the models of no additive or no multiplicative interaction. Implications for the size of the study needed to detect differences between the models are also addressed. Because of the simple form of the test statistics, combination of evidence from different studies or strata is straightforward. Illustration of how the method could be extended to data from a 2xRxC table is also given. Keywords: Case-control studies; Cohort studies; Interaction; Multiplicative; Additive. 1. INTRODUCTION In its statistical meaning, interaction of two risk factors requires departure from additivity in their effect on outcome. We concentrate on two binary risk factors with outcome variable the occurrence or non-occurrence of a rare condition and with their interaction as the primary focus of interest. Let θ ij denote the probability of occurrence when the two risk factors are at levels i, j, where i, j = 0, 1. For convenience, we take (0, 0) as a baseline condition for some of the discussion, although this special choice has no impact on the conclusions. Two different representations of the additivity of effect are and θ 10 = θ 00 + α A,θ 01 = θ 00 + β A,θ 11 = θ 00 + α A + β A (1) log θ 10 = log θ 00 + α M, log θ 01 = log θ 00 + β M, log θ 11 = log θ 00 + α M + β M. (2) Equation (2) can equivalently be written To whom correspondence should be addressed. θ 10 = θ 00 λ M,θ 01 = θ 00 ψ M,θ 11 = θ 00 λ M ψ M. (3) Biostatistics Vol. 6 No. 1 c Oxford University Press 2005; all rights reserved.

2 2 A. BERRINGTON DE GONZÁLEZ AND D. R. COX Models (1) and (2) respectively define no additive interaction, H A0, and no multiplicative interaction, H M0. Both (1) and (2) are used in the epidemiological and other literature. Additive models may have a direct public health interpretation in that for a large population of individuals the difference in the numbers of positive outcomes for, say i = 1, j = 0ascompared with the numbers had the individuals been in the baseline state i = j = 0isproportional to α A. Advantages of the multiplicative form are that comparisons are summarized in simple ratios, often not very different from unity, which, moreover, are often relatively stable across populations. From a formal point of view (1) could be generalized to g(θ 10 ) = g(θ 00 ) + α G, g(θ 01 ) = g(θ 00 ) + β G, g(θ 11 ) = g(θ 00 ) + α G + β G, (4) where g(θ) is a suitable monotonic function of θ, for example a power. To be scientifically fruitful, however, the function g(θ) would have to be reasonably easily interpreted and this restricts the choice appreciably. In the present paper we consider only the forms (1) and (2). For a more general discussion of the statistical aspects of interaction see Cox (1984). 2. ANALYSIS OF EMPIRICAL DATA For empirical data, the forms (1) and (2) may need to be compared. The data may be consistent with none, one or both of the models. There are various ways in which this issue can be tackled. One is to calculate a Bayes factor aiming to be an effective likelihood ratio for the model comparison. A second (Aranda-Ordaz, 1981) is to embed the models in a family characterized by a parameter, η, say, to estimate the value of η and to check for consistency with the values corresponding to (1) and (2). The third approach, and the one adopted here, is to provide two tests of significance, one for H A0, sensitive for departures in the direction of the multiplicative interaction model, and the other for H M0,sensitive for departures in the direction of the additive interaction model. There result two p-values from which one can assess the consistency with both, just one, or neither model. We regard this as conceptually the simplest and the most readily interpreted approach. 3. SENSITIVITY A question of general interest concerns the amount of data likely to be needed to distinguish between H A0 and H M0. This requires study of the power of the associated tests. Formulation of power requirements demands several inevitably arbitrary choices and therefore approximate calculation of power is entirely adequate for most purposes. For this we use the following simplifying result. Suppose that T is a test statistic for the null hypothesis H 0 which, under H 0 is approximately normally distributed with zero mean and variance σ0 2 /n, where n is a sample size. Suppose also that under the alternative hypothesis of interest T is distributed with median approximately µ.infact we assume typically that T is approximately symmetrically distributed with mean µ. Then power of 50 per cent is approximately achieved for a onesided test at level of significance ɛ if that is if µ = k ɛ σ 0 / n, Here k ɛ is the upper ɛ point of the standard normal distribution. n = k 2 ɛ σ 2 0 /µ2. (5)

3 Additive and multiplicative models for the joint effect of two risk factors 3 If, for example for comparison with other investigations, it is unavoidable to use power 1 β, then k ɛ should be replaced by k ɛ + k β ; the extra approximation involved is that the variance of the statistic under the alternative differs little from that under the null hypothesis. Requirement of 50 per cent will be used, however, throughout this paper as it is likely to be adequate for most purposes. 4. COHORT STUDIES 4.1 Additive model In a cohort study of two risk factors for a disease, such as a gene and an environmental exposure, if there are r ij deaths out of n ij individuals (i, j = 0, 1), then the estimated risk is ˆρ ij = r ij /n ij with approximately var( ˆρ ij ) = ρ ij /n ij and var(log ˆρ ij ) = 1/(n ij ρ ij ) for rare conditions, i.e. small ρ ij.wetest the hypothesis that the effects are additive, i.e. there is no evidence of additive interaction between the two risk factors, using T A = ˆρ 11 ˆρ 10 ˆρ 01 +ˆρ 00. (6) ( ˆρij /n ij ) In general E(T A ) ρ 11 ρ 10 ρ 01 + ρ 00 ρij /n ij (7) and there will be approximately 50% power where E(T A ) is equal to k ɛ, the upper ɛ point of the standard normal distribution. If p ij is the probability of being exposed to levels i and j of the two risk factors then n ij = np ij and this implies ρ 11 ρ 10 ρ 01 + ρ 00 = k ɛ ( ρij /p ij )/ n. (8) If the data were actually generated from a multiplicative model without interaction then if we take (0, 0) as a reference level we can write this multiplicative model in the form ρ 00 = ρ 0,ρ 01 = ρ 0 λ, ρ 10 = ρ 0 ψ, ρ 11 = ρ 0 λψ. (9) Now suppose we want to know the expected number of deaths needed in the baseline group in order to detect this form of departure from an additive model. If we define this number as r 0 M = np 00ρ 0, the condition for 50% power becomes nρ0 (λ 1)(ψ 1) = k ɛ (1/p01 + λ/p 00 + ψ/p 10 + λψ/p 11 ) (10) so that r0 M = kɛ 2 ( (λ 1) 2 (ψ 1) λp 00 + ψp 00 + λψp ) 00. (11) p 01 p 10 p 11 For example, if the exposure probabilities are all equal (p 00 = p 01 = p 10 = p 11 ), and the relative risks associated with each exposure are both equal to two (λ = ψ = 2) and k ɛ = 2, then r0 = 36, (12) M i.e. approximately 36 deaths would be required in the baseline (unexposed) group to achieve 50% power.

4 4 A. BERRINGTON DE GONZÁLEZ AND D. R. COX Alternatively, we may prefer to know what total number of deaths would be required in order to be able to detect this form of departure from the additive no interaction model. If the expected number of deaths in total is t then our requirement for 50% power is M t M = kɛ 2 ( 1 (λ 1) 2 (ψ 1) 2 + p 00 Note that in the symmetric case where p y = constant and λ = ψ, λ + ψ + λψ ) (p 00 + λp 01 + ψp 10 + λψp 11 ). (13) p 01 p 10 p 11 r0 t M = k2 ɛ M = k2 ɛ (λ + 1) 2 (λ 1) 4, (14) (λ + 1) 4 (λ 1) 4. (15) 4.2 Multiplicative model Now suppose we test consistency with the multiplicative model without interaction by the statistic T M = log ˆρ 11 log ˆρ 10 log ˆρ 01 + log ˆρ 00 {1/(n00 ρ 00 ) + 1/(n 01 ρ 01 ) + 1/(n 10 ρ 10 ) + 1/(n 11 ρ 11 )} (16) with evidence of departure in the direction of the additive model if T M < k ɛ.again 50% power is achieved when n log{(ρ11 ρ 00 )/(ρ 01 ρ 10 )} k ɛ = {1/(p00 ρ 00 ) + 1/(p 01 ρ 01 ) + 1/(p 10 ρ 10 ) + 1/(p 11 ρ 11 )}. (17) We write an additive model without interaction, with (0, 0) as baseline, r 0 A = np 00ρ 00 Then r0 A = k2 ɛ ρ 00 = ρ 0,ρ 01 = ρ 0 (1 + ξ),ρ 01 = ρ 0 (1 + η), ρ 11 = ρ 0 (1 + ξ + η). (18) { 1 + }[ p 00 p 01 (1 + ξ) + p 00 p 10 (1 + η) + p 00 log p 11 (1 + ξ + η) ] 1 + ξ + η 2. (19) (1 + ξ)(1 + η) With p 00 = p 01 = p 10 = p 11, k ɛ = 2,ξ = η = 1, this gives r0 = 110. (20) A The expected numbers of deaths needed in each category of exposure under the additive and multiplicative models without interaction are shown in Table 1. Note that in the symmetric case, p y = const and ξ = η, r0 A = (ξ 2 [ ] + 4ξ + 2) (1 + 2ξ) 2 2k2 ɛ log (1 + ξ)(1 + 2ξ) (1 + ξ) 2 (21) and that the total number of expected deaths is 4(1 + ξ) r 0 A. Tables 2 and 3 shows examples of how the sample sizes to detect departures from an additive model without interaction in the multiplicative direction and a multiplicative model without interaction in the additive direction vary when the relative risk λ = 2 whilst ψ is allowed to vary from 1.5 to4and the probability of being exposed to both risk factors, p 11,isallowed to vary from 0.05 to 0.3 whilst the other exposure probabilities are all equal (p 00 = p 01 = p 10 ).

5 Additive and multiplicative models for the joint effect of two risk factors 5 Table 1. Expected number of deaths needed to detect departure from additive model in multiplicative direction ( r 0 M ) and from a multiplicative model in an additive direction ( r 0 A ) r0 r0 M A j j i Table 2. Sample size required in the baseline group of a cohort study to detect departure from multiplicative model in the direction of an additive model ψ p a a In this and subsequent tables values are given to two working digits. Table 3. Sample size required in the baseline group of a cohort study to detect departure from an additive model in the direction of a multiplicative model ψ = 1 + ξ p CASE-CONTROL STUDIES 5.1 Multiplicative model A relatively minor change in the argument deals with (unmatched) case-control studies. Consider a single case-control study with one binary exposure with frequency m rs ; r = 0 (control), r = 1(case); s = 0(exposure -), s = 1(exposure +). Then the log relative risk is ˆθ = log{(m 11 m 00 )/(m 01 m 10 )} with asymptotically var( ˆθ) = 1/m rs = 4/ m, (22) where m is the harmonic mean frequency. The estimate of the relative risk ˆφ = e ˆθ has asymptotic variance var( ˆφ) = 4φ 2 / m. (23) Now suppose we have two exposures and let m rij (r = 0 (control), r = 1 (case)) be the frequency in exposure category (i, j) for i, j = 0, 1. Write m ij = 2/(1/m 0ij + 1/m 1ij ) for the relevant harmonic mean frequency. Then with ˆγ ij = log(m 1ij /m 0ij ),var( ˆγ ij ) = 2/ m ij, consistency with a multiplicative no

6 6 A. BERRINGTON DE GONZÁLEZ AND D. R. COX interaction model is tested by T M = ˆγ 11 ˆγ 01 ˆγ 10 +ˆγ 00 2/ mij (24) with evidence of departure in the direction of additivity if T M < k ɛ. Under a form additive for relative risk (without interaction), and arbitrarily taking (0, 0) as baseline, we can write γ 01 = γ 00 + log(1 + α 01 ), γ 10 = γ 00 + log(1 + α 10 ), γ 11 = γ 00 + log(1 + α 10 + α 01 ), (25) so that 50% power is achieved when { } (1 + α01 )(1 + α 10 ) log = k ɛ 2/ mij. (26) 1 + α 01 + α 10 Write q ij = m ij / m kl,sothat q ij = 1 and q ij is the proportion of individuals in the risk category (i, j), with cases and controls combined via a harmonic mean. We write a = 2 m ij /n where n is the total number of individuals. In general a 1, with equality when numbers of cases and controls are almost the same cell by cell. Then the required n is given by Note that 1/q ij 16. { }] (1 + n A = 4kɛ 2 α01 )(1 + α 10 ) 2 a 1 1/q ij [log. (27) 1 + α 01 + α Additive model Consistency with an additive no interaction model can be tested by dividing ˆφ 11 ˆφ 10 ˆφ 01 + ˆφ 00 by its estimated standard error, where ˆφ ij is the estimated risk in exposure category (i, j) relative to baseline (0, 0). The numerator is, however, proportional to the simpler statistic m 111 /m 011 m 110 /m 010 m 101 /m m 100 /m 000, leading to the test statistic T A = m 111/m 011 m 110 /m 010 m 101 /m m 100 /m 000 ˆφ 00 (2 ˆφ 2 ij / m ij) = ˆφ 11 ˆφ 10 ˆφ 01 + ˆφ 00 (2 ˆφ ij 2 / m. (28) ij) Under a multiplicative model φ 10 = φ 00 (1 + β 10 ), φ 01 = φ 00 (1 + β 01 ), φ 11 = φ 00 (1 + β 10 )(1 + β 01 ). Then 50% power is achieved when and the total number of individuals is n M = 4k2 ɛ aβ 2 10 β2 01 φ 2 00 β2 01 β2 10 = k2 ɛ 2φ2 ij / m2 ij (29) {1/q 00 + (1 + β 10 ) 2 /q 10 + (1 + β 01 ) 2 /q 01 + (1 + β 10 ) 2 (1 + β 01 ) 2 )/q 11 }. (30) Note that in n A, for given n, q ij = 1/4isoptimal; in n M this is not quite the case but the main point is that a small q ij lowers sensitivity greatly, as is to be expected. There may not be control over this in design, however.

7 Additive and multiplicative models for the joint effect of two risk factors 7 Table 4. Sample size required for a case-control study to detect departure from a multiplicative model in the direction of an additive model β q Table 5. Sample size required in a case-control study to detect departure from an additive model in the direction of a multiplicative model β q In the symmetrical cases, q ij = 1/4,α 10 = α 01 = α and β 01 = β 10 = β, [ ] n A = 4kɛ 2 a 1/ (1 + α)2 log, (31) 1 + 2α n M = k2 ɛ aβ 4 (1 + 2(1 + β)2 + (1 + β) 4 ). (32) Tables 4 and 5 show the required sample sizes for a case-control study to detect departures from a multiplicative and additive model for interaction, respectively. The odds ratio α is 2 whereas the odds ratio β varies from 1.5 to4.for these examples we have assumed that a = 1 and that q 00 = q 01 = q 10 whilst q 11 is allowed to vary from 0.05 to EXAMPLE AND DISCUSSION Znaor et al. (2003) investigated whether there was evidence of interaction between chewing tobacco and alcohol consumption with respect to the risk of oral cancer in a case-control study of Indian men. We reproduce the data for those men who did not smoke tobacco and calculate the crude odds ratios in a twoby-four table (see Table 6). The observed odds ratio for the joint effect of the two risk factors (44.1) was considerably greater than expected under an additive model without interaction ( = 16.7) and slightly greater than expected under a multiplicative model without interaction ( = 39.3). Here T A = 2.5 suggests there is evidence of significant departure from the additive model in the multiplicative direction, but T M = 0.3 confirms that there is no evidence of departure from the multiplicative model in an additive direction. We have discussed only the simplified case of a single set of data. Because of the simple form of the test statistics, combination of evidence from independent studies or strata is straightforward. An important example of such a situation would be the one where adjustment for confounders was necessary. If the adjustments had been made by logistic regression then the variance of the test statistic would be somewhat greater than the Poisson variance and if, for example, the adjusted log relative risk is ˆθ ij then the statistic

8 8 A. BERRINGTON DE GONZÁLEZ AND D. R. COX Table 6. Estimated odds ratios from Znaor et al. (2003) Chewing tobacco Alcohol Cases Controls Odds ratio var[ln(or)] No No No Yes Yes No Yes Yes Table 7. Odds ratios adjusted for age, centre and education level from Znaor et al. (2003) Chewing tobacco Alcohol Odds Ratio* var[ln(or )] No No 1. No Yes Yes No Yes Yes to test for multiplicative interaction in a case-control study becomes T M = ˆθ 11 ˆθ 01 ˆθ 10 + ˆθ 00 var( ˆθ ij ). (33) The odds ratios actually published by Znaor et al. had been adjusted for age, centre and education level. These adjustments reduced the odds ratios for the effect of chewing tobacco and increased their standard errors (see Table 7). Therefore, when the tests for interaction are conducted on the adjusted data there is no evidence of departure from the multiplicative or the additive models without interaction (T M = 0.03 and T A = 1.05). Inclusion of adjustments in sample size calculations could be made by assuming that the adjustment increases the variance by a constant c across all strata and then the sample size estimates are increased by 1 + c. Finally, extension of the method to the situation of interaction in a 2xRxC table could be approached by extracting a single degree of freedom for an initial test. This would be more sensitive than an examination of independence across the RxC contingency table (Yates, 1948). For example, in Znaor et al. there were actually two levels of chewing: with and without tobacco. An examination of whether the increase in risk with increasing level of chewing differed between ever and never alcohol drinkers (2x2x3) could be examined by assigning the levels of chewing (never, without tobacco and with tobacco) to be 3, 1, 2; then a test statistic for departure from the multiplicative model in the additive direction T M would be T M = (2 ˆθ 13 + ˆθ 12 3 ˆθ 11 ) (2 ˆθ 03 + ˆθ 02 3 ˆθ 01 ) (4var( ˆθ 13 ) + var( ˆθ 12 ) + 9var( ˆθ 11 ) + 4var( ˆθ 03 ) + var( ˆθ 02 ) + 9var( ˆθ 01 )) (34) with evidence of departure in the direction of additivity if T M < k ɛ.again these calculations could include adjustments if necessary with the use of the same strategy as described above for the 2x2x2 table. REFERENCES ARANDA-ORDAZ, F. J.(1981). On two families of transformations to additivity for binary response data. Biometrika 68,

9 Additive and multiplicative models for the joint effect of two risk factors 9 BOTTO, L. D. AND KHOURY, M. J.(2001). Commentary: facing the challenge of gene-environment interaction: the two-by-four table and beyond. American Journal of Epidemiology 153, COX, D.R.(1984). Interaction. International Statistical Review 52, SIEMIATYCKI, J. AND THOMAS, D. C.(1981). Biological models and statistical interactions: an example from multistage carcinogenesis. International Journal of Epidemiology 10, YATES, F.(1948). The analysis of contingency tables. Biometrika 35, ZNAOR, A., BRENNAN, P.,GAJALAKSHMI, V.,MATHEW, A., SHANTA, V.,VARGHESE, C.AND BOFFETTA, P. (2003). Independent and combined effects of tobacco smoking, chewing and alcohol drinking on the risk of oral, pharyngeal and esophageal cancers in Indian men. International Journal of Cancer 105, [Received January 15, 2004; first revision June 21, 2004; second revision July 15, 2004; accepted for publication 19 August, 2004]

Power and Sample Size Calculations with the Additive Hazards Model

Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine