A Non-parametric \Trim and Fill" Method of. Sue Taylor. Department of Preventive Medicine and Biometrics,

Size: px

Start display at page:

Download "A Non-parametric \Trim and Fill" Method of. Sue Taylor. Department of Preventive Medicine and Biometrics,"

Vanessa Allison
6 years ago
Views:

1 A Non-parametric \Trim and Fill" Method of Assessing Publication Bias in Meta-analysis Sue Taylor Department of Preventive Medicine and Biometrics, University of Colorado Health Sciences Center, Denver, CO 8262 USA and Richard Tweedie Department of Statistics, Colorado State University, Fort Collins, CO 8523 USA September 16, 1998 Abstract Meta-analysis collects and synthesizes results from individual studies to estimate an overall eect size. If published studies are chosen, say through a literature review, an inherent selection bias may arise, since for example, studies may tend to be published more readily if they are statistically signicant, or deemed to be more `interesting' in terms of the impact of their outcomes. We develop a simple rank-based data augmentation technique, formalizing the use of funnel plots, to estimate and adjust for the numbers and outcomes of missing studies. Several non-parametric estimators are proposed for the number of missing studies, and their properties are developed analytically and through simulations. We apply the method to simulated and epidemiological data sets, and show it is both eective and consistent with other criteria in the literature. Corresponding author's address: tweedie@stat.colostate.edu Key words: Meta-analysis; Publication bias; Missing studies; File drawer problem; Funnel plots; Data augmentation; lung cancer; passive smoking 1

2 1 Introduction 1.1 The publication bias problem There has been an enormous recent increase in the use of meta-analysis as a statistical technique for combining the results of many individual analyses in areas of clinical trials, epidemiology, sociology and psychology (Hedges and Olkin, 1985; Olkin, 1992; Cooper and Hedges, 199). In all these contexts, one well-documented concern is the need to collect all relevant studies, both published and unpublished, if the subsequent inferences are to be valid (Iyengar and Greenhouse, 1988; Dear and Begg, 1992; Hedges, 1992; Begg, 199; Begg and Mazumdar, 199; Gleser and Olkin, 1996; Egger et al, 1997). The use of a non-representative proportion of signicant studies, or studies dierentially giving results in a \positive" direction, will lead to a non-representative set of studies in the meta-analysis data set. A standard meta-analysis model will then result in a conclusion biased toward signicance or positivity. This is particularly problematic for a meta-analysis whose data come solely from the published scientic literature, but studies may be suppressed for many other reasons than failure to be published (see Cooper (1998), Givens, Smith and Tweedie (1997) and the discussion thereto, and Misakian and Bero (1998)). This phenomenon has become known as `publication bias', or the `le-drawer problem' (Iyengar and Greenhouse, 1988). In this paper we describe a new way to account for the magnitude of this problem. There are two steps to our approach. Firstly, we estimate the number of missing studies using methods based on symmetry assumptions. These are simple to implement, in contrast to most existing methods, and they appear in practice to pick up the \missing studies" indicated visually by funnel plots (Light and Pillemer, 198), as described in Section 1.2. Secondly, we impute the missing values using a \trim and ll" approach that enables us to derive estimates of the overall eect on the inferences in the meta-analysis due to the publication bias. We show that this dual approach works well in simulation studies and illustrate its use on an epidemiological example. Rather than merely testing for the possibility of publication bias, as in, say, Begg (199) or Egger et al (1997), the trim and ll algorithm appears to give a method of evaluating and also adjusting for the possibility of publication bias, so that one can tell whether the magnitude of the problem warrants consideration or not. 1.2 Funnel plots and publication bias There are several methods which have been proposed to detect the existence of publication bias in a meta-analysis. Perhaps the most common is the funnel plot (Light and Pillemer, 198), and related graphical methods for visually determining the existence of missing studies (Galbraith, 1988). Figure 1(a) shows a funnel plot using data from a simulation of 35 studies, each of which supplies an estimate Y i of the eect in question in the i th study, and an estimate of the variance i 2 within that study. The funnel plots we use depict?1 i graphed against Y i, so that the most precise 2

3 estimates (for example, those from the largest studies) are at the top of the funnel, and those from less precise, or smaller, studies are at the base of the funnel. The data were generated as in Section with Y i N(; i 2 ) so that the true mean here is ; the sample used has metaanalyzed estimate (using the random eects model below) b = :8 with 95% condence interval (CI) (?:18; :178), which is not signicant at the usual 5% level. The fact that there is a \funnel" shape is not based on any detailed modeling: it relies on two empirical observations, namely (F1) The variances i 2 of studies in a meta-analysis are not identical, but are distributed in such a way that there are fewer precise studies and rather more imprecise studies: this appears to be borne out by data in, for example, Light and Pillemer (198, Chapter 3) amongst many others. (F2) At any xed level of i 2, studies are symmetrically distributed around the true mean. This is justied if we use the normal model in Section 2.1, but clearly is a much weaker assumption than such normality. FIGURE 1 (a), FIGURE 1 (b), FIGURE 1 (c) NEAR HERE To illustrate the eect of publication bias, in Figure 1(b) we show the same funnel plot as in Figure 1(a), but with the `left-most' 5 studies suppressed. This gives the typical pattern that is taken to indicate publication bias (Light and Pillemer, 198): the assumption is that, whether because of editorial policy or author inaction or other reasons, these papers (which show, say, no signicance, or perhaps the reverse eect (namely < ) from that envisaged when carrying out the studies (namely > )) are the ones that might not be published (see Cooper, 1998 pp 5-55). Such suppression will aect the estimate of. Indeed in this example it increases b to :12 with 95% condence interval (CI) (:37; :21), which is now signicant at the 5% level: in other words, suppressing these ve studies leads to an incorrect inference by usual standards, and increases the estimated mean by almost 6%. The evaluation of funnel plots is subjective in much of the literature. A number of more quantitative methods of detecting publication bias have been proposed: these range from a simplistic method of moments approach (Sugita, 1992,199), to more sophisticated methods such as a rank correlation test (Begg, 199; Begg and Mazumdar 199) based on the form of a biased funnel plot as in Figure 1(b), a method based on p-values (Gleser and Olkin, 1996), or a regression based test (Egger et al, 1997) based on the Galbraith (1988) version of a funnel plot. These tests are all designed to detect rather than adjust for publication bias. There also exist several quantitative methods which estimate the number of missing studies and, by explicitly modeling the probability of publication, provide estimates of the eect of the missing studies on the overall eect size (Dear and Begg, 1992; Hedges, 1992; Givens et al, 1997; Smith, Givens and Tweedie, 1998). All are complex and highly computer intensive to run, and Dear and Dobson (1997) noted \previous 3

4 methods have not been much used... (and)... the value of any new statistical methodology depends, in part, on the extent to which it is adopted"; they also noted that \the culture of meta-analysis has traditionally favoured very simple methods". We now develop a simple technique that seems to meet many of the objections to other methods. 2 Adjusting for Publication Bias 2.1 Fixed and Random Eects Models We rst describe a standard structure for a meta-analysis in the absence of publication bias. We assume we have n individual studies, all of which are addressing the same problem; and that there is some global \eect size" (such as log relative risks, risk dierences or log mortality ratios in clinical or epidemiological trials; or dierences in performance in sociological experiments) which is relevant to the overall problem, and which each study attempts to measure. For each j = 1; : : :; n, study j produces an eect size Y j which estimates, and an estimated within-study variance j 2. The random eects (RE) model which is commonly used to combine the Y j is then Y j = + j + j (1) where j N(; 2 ) is introduced to account for heterogeneity between studies, and j N(; 2 j ) represents the within-study variability of study j. The RE approach has been argued (NRC Report, 1992) to be preferable to the xed eects (FE) model which assumes that 2 =, so that any heterogeneity between studies is purely random. This model leads, through normal theory, to the meta-analyzed estimate of given by b = P Yj w j = P w j where w j = ( 2 j + 2 )?1, and with Var[b ] = 1= P wj : In tting this estimator, it is usually assumed that the j 2 are known. In the FE model we take 2 = ; in the RE model there are various moment-based and maximum likelihood approaches giving estimates of 2 (Biggersta and Tweedie, 1997), the most common being the DerSimonian-Laird estimator (DerSimonian and Laird, 1986), which we use in Section 5 and Section 6 below. 2.2 Modeling Publication Bias Now we modify the model above to account for publication bias. We assume that in addition to the n observed studies, there are an additional k relevant studies which are not observed, due to publication bias. The value of k, and the eect sizes which might have been found from these k studies, are unknown and must be estimated; and uncertainty about these estimates must be reected in the nal meta-analysis inference. The frequentist models of Dear and Begg (1992) and Hedges (1992), and the Bayesian models of Givens et al (1997), estimate both k and the positions of the missing studies using the normal

5 models above together with a model of the probabilities that studies are missing in dierent p-value intervals. In contrast, our approach is non-parametric, and relies only on symmetry assumptions, which are certainly satised by both the FE and RE models above. To describe this, we rst recall the standard assumption behind the Wilcoxon test for a collection X i ; i = 1; : : :; N of random variables (Marascuilo and McSweeney, 1977). If the hypothesis is that the median of each of the X i =, then each observed value is assumed to have arisen by rst generating jx i j and then generating the sign of X i according to an independent set of Bernoulli variables taking values?1; 1 with equal probability. If r i denotes the rank of jx i j, and if W + N = X Xi> r i is the sum of the ranks associated with the positive X i, then we say that W + N distribution. has a Wilcoxon Now let us suppose that there are originally n + k studies with this structure, and that k values from the set X j have been suppressed, leaving a set of n observed studies. Our key assumption is: The suppression has taken place in such a way that it is the k values of the X j with the most extreme negative ranks that have been suppressed. This might be expected to lead to a truncated funnel plot such as we have in Figure 1(b), for example. We will call this model for the overall set of studies the suppressed Bernoulli model. We are now left with n observed values, and these remaining jx i j are re-ranked as r i : these ranks run now from 1 to n. We will consider three statistics based on this set of observed ranks. Firstly, we let denote the length of the rightmost run of ranks associated with positive values of the observed X i : that is, if h is the index of the most negative of the X i, and if r h is its absolute rank, then = n? r h. Secondly, we consider the Wilcoxon statistic for the observed set of variables omitting the suppressed variables. In order to distinguish the suppressed data-set from a set of random variables each of which is symmetric, we will dene the `trimmed' rank test statistic for the observed n values as T n = X Xi> r i : (2) The distributions of and T n depend on k, although we omit this in the notation. Based on these quantities we dene three estimators of k, given by R =? 1; (3) L = T n? n(n + 1) 2n? 1 () 5

6 and q Q = n? 1=2? 2n 2? T n + 1=: (5) The forms of L and Q are based on method of moments considerations developed in Section 3.3. The main analytic results of this paper are the following, which are proved in Section 3. Theorem 2.1 Under the assumption that the median of the original X i is, and under the suppressed Bernoulli model, (i) the estimator R has mean and variance given by (ii) the estimator L has mean and variance given by where E[R ] = k ; Var[R ] = 2k + 2; (6) E[L ] = k? k 2 =(2n? 1); Var[L ] = 16Var(T n )=(2n? 1) 2 ; (7) Var(T n ) = 2?1 [n(n + 1)(2n + 1) + 1k k k? 18nk 2? 18nk + 6n 2 k ]; (8) (iii) the estimator Q has mean and variance given (approximately) by 2Var[T n ] E[Q ] k + ; Var[Q [(n? 1=2) 2? k (2n? k? 1)] 3 ] 2 Var[T n ] (n? 1=2) 2? k (2n? k? 1) : (9) Remark: If we consider the asymptotic situation where n is large and k is of smaller order than n, then E[L ] k ; E[Q ] k ; E[R ] = k Var[L ] n=3; Var[Q ] n=3; Var[R ] = o(n) Hence both L and Q will have similar behavior, although in practice Q is usually of greater magnitude than L and occasionally appears to give excessively large values. It can be shown that the MSE of L is smaller than that of R for certain realistic ranges of n; k, and it is more robust against certain congurations of data than is R : for example, if there is just one very negative value of X i followed by a missing collection of studies (which violates the exact assumption we have made but not the spirit of publication bias), then R will be zero although L may be non-zero. Remark: Since we have an expression for the bias of L, we can in principle remove this to at least O(n?2 ). If we dene L 1 = L [1? (1=2n)] + L 2 =(2n? 1)? 1=6? 1=n (1) then it is possible to check that, again for n large and k xed and smaller than n, E[L 1 ] = k + O(n?2 ): However, this seems to make only marginal dierence in simulations (Taylor, 1998). 6

7 3 Estimators of k 3.1 The estimator R under the suppressed Bernoulli model We rst examine the simple estimator R dened by (3). Begin by considering the sequence of signed ranks in the original (unsuppressed) data set. Thus, for example, if the original set of studies is of size n + k = 1, suppose the ranks and their associated signs are?1;?2; 3; ; 5;?6;?7; 8; 9;?1; 11; 12;?13; 1: Now suppose we suppress k = 3 studies, which (since we suppress the three most negative studies) means we suppress those corresponding to?13;?1;?7. The signed ranks after ranking the remaining 11 studies are?1;?2; 3; ; 5;?6; 7; 8; 9; 1; 11: We can consider this as being in two groups: the set f7; 8; 9; 1; 11g which were \promoted" to be the top ve ranks, with positive sign, due to the suppression; and the remaining six ranks, which still have the same random permutation as before. Thus in this case the run of positive ranks is of length ve, and = 5. But we can see that is just the number of positive terms we need in order to reach the rst negative value which is not suppressed, which is the (k + 1) st negative term in the original sequence; and hence is a negative binomial variable corresponding to a sequence of simple Bernoulli coin-tosses. Using the negative binomial form (Casella and Berger (199), p. 96) we thus have immediately the following result, which gives (6). Theorem 3.1 The distribution of the estimator R is given by P(R = m) = k + m + 1 m k +m+2 (11) Remark: The major drawback to this estimator is that it is non-robust to the existence of a relatively isolated negative term at the right hand end of the sequence of ranks. Of course, this is consistent with the suppression hypothesis we have used: but in practice, we may nd funnel plots with one such isolated value and the appearance of a gap in the next run of positive values. The use of R may be inappropriate in this case and the next set of estimators, based on T n, seem rather better in this context. 3.2 Moments of T n under the suppressed Bernoulli model We now nd the rst and second moments of T n and show that we can use these to derive the forms L and Q as in () and (5). 7

8 Theorem 3.2 For given xed k, under the suppressed Bernoulli model where W + N T n = n + (n? 1) + : : : + (n? + 1) + W + n? (12) has the Wilcoxon distribution; and has the negative binomial distribution k + m? 1 1 k +m P( = m) = : (13) m 2 The rst two moments of this mixture distribution are and V ar[t n ] given by (8). E[T n ] = [n(n + 1) + k (2n? k? 1)]= (1) Proof By removing the \top" k negative ranks we are left with matching positive ranks, and we denote the number of these by. By the suppressed Bernoulli model these have the distribution (13) (Casella and Berger (199), p. 96): note that includes one less \positive run" than. After allowing for these positive ranks, the remaining n? of the ranks are independently positive and negative leading to the nal term in (12). Recalling that W m + has mean and variance given for xed m by E[W m] + = m(m + 1)= and Var[W m] + = m(m + 1)(2m + 1)=2, and using the fact that E[] = k and E[ 2 ] = k 2 + 2k, we nd E[T n ] = E [E[W + n? + (n? + 1) + + nj]] = P 1 = [ (n?)(n?+1) + n(n+1)? (n?)(n?+1) ]P[ = ] 2 2 = n(n+1)? (2n+1) E[] + 1 Var[] + 1 [E[]]2 + (n )E[]? 1 2 E[2 ] (15) Note that this shows that = n(n+1) + k [2n?k?1] : E[T n ] = E[W n + ] + k [2n? k? 1] : (16) A similar calculation shows that, if we actually know the value of, E[T n j] = E[W + n (2n? + 1) ] + : 8

9 Using this, for the variance we have Var[T n ] = E [Var[T n j]] + Var [E[T n j]] = P 1 = Var[W + n? + (n? + 1) + + n]p[ = ] + Var [E[W + n ] + (2n?+1) ] = P 1 = Var[W + n? ]P[ = ] + Var [ (2n?+1) ] = 1 2 (P (n? )(n? + 1)(2(n? ) + 1)P[ = ]) + E[[ [2n?+1] ] 2 ]? [E[ [2n?+1] ]] 2 = 1 2 [2n3 + 3n 2 + n? [6n 2 + 6n + 1]E[] + [6n + 3]E[ 2 ]? 2E[ 3 ]] [E[ ]? [n + 2]E[ 3 ] + [n 2 + n + 1]E[ 2 ]]? [ 1 [[2n + 1]E[]? E[2 ]]] 2 = Var[W + n ] [1k3 + 27k k? 18nk 2? 18nk + 6n 2 k ] (17) where we have used the fact that E[ 3 ] = k 3 + 6k 2 + 6k and E[ ] = k + 12k k k ; and this gives our result. ut Remark: Note that in using this approach we are assuming that n is suciently larger than k that will not be truncated prior to actually nding k negative values: in practice this seems to be a reasonable assumption given that we should not be considering such methods if k is overly large with respect to n. We have also ignored the fact that we know that : using the distribution of conditioned on this inequality would be more accurate, although the algebra involved would be considerably more tedious, and the results below indicate that the estimators we use have acceptable properties without this extra level of complication. 3.3 Estimating k using T n We assume for the purposes of this section that the median of the X i is known, and take it as. We rst consider the method of moments estimator for k which follows by solving (1): this gives Q = (n? 1=2)? p (n? 1=2) 2? [T n? n(n + 1)=] (18) which gives (5). In dening this estimator, we are assuming that only the negative root of the quadratic (1) is relevant. The other root leads to a (rounded) estimate of k which is always as large as n: our choice of Q therefore implicitly requires a belief that we are in fact observing at least 1=2 of the real studies, which seems the more realistic situation. More importantly, for Q to be well dened, the discriminant 2n 2?T n +1= must be non-negative: that is, we must have T n < n 2 =2 + 1=16 for this estimator to be dened. Simulations show that when k is large with relation to n, and when n itself is small, this may be violated quite frequently. 9

10 Thus, although we do evaluate properties of Q, it clearly has drawbacks in practice. We do note, however, that the discriminant is negative only for large values of T n ; and that at the point where it is zero, our estimate would be n? 1=2. It is thus logical to use this latter, rather large, estimate as a lower bound when Q is complex. Because of these concerns, we now consider a second estimator based on (1). If we ignore the quadratic term involving k in (1), which is relatively small, at least for large n, then from the linearization T n? n(n + 1)= (2n? 1)k = we nd the explicit linear estimator as given in () above. We now prove the remainder of Theorem 2.1. (i) For the estimator L, we have, using (16) L = (T n? n(n + 1)=) ; (19) 2n? 1 E[L ] = [2n?1] E[T n? E[W + n ]] k = [2n?k?1] [2n?1] = k? k2 2n?1 ; (2) Var[L ] = ( 2n?1 )2 Var[T n? E[W + n ]] = ( 2n?1 )2 Var[T n ]: (ii) For the estimator Q we write Q = f(x) = (n? 1 2 )? q(n? 1 2 )2? X where X = T n? E[W + n ] and use a Taylor series expansion about = E[X]. Using the rst-order approximation E[Q ] f() we get from (16) that E[Q ] (n? 1 2 )? q(n? 1 2 )2? (E[T n ]? E[W + n ]) = (n? 1 2 )? q(n? 1 2 )2? k (2n? k? 1) = k : (21) Adding a second-order term, we use the approximation E[Q ] f() f ()Var[X] to give: E[Q ] k + 2Var[Tn] [(n? 1 2 )2?k (2n?k?1)] 3 2 : (22) 1

11 To derive the variance of Q we use the approximation that Var[Q ] [f ()] 2 Var[X] so that 2 Var[Q ] [ p ] 2 Var[T (n? 1 n ] 2 )2?k (2n?k?1) = Var[Tn] (n? : (23) 1 2 )2?k (2n?k?1) ut Properties of the estimators In using any of the estimators above, we will round to the nearest non-negative integer, since algorithmically we need to trim whole studies rather than fractions of studies. Thus in practice we use R + = maxf; R g; L + = [maxf; L + 1=2g]; Q + = [maxf; Q + 1=2g]; (2) where [x] is the integer part of x. We now show using simulations that the means and variances of these estimators, which we shall call \empirical" versions, are acceptably close to those in Theorem 2.1 for the ranges of n; k that we address, and thus the use of the theoretical values seems justied. To carry out the simulations we generated, for each of the cases below, 5 sets of N = n + k normal variates, each with mean zero and variances 2 i, the i being taken from a?(3; 1=9) density. Figure 2 shows the means of the simulations for R + ; L+ ; Q+ compared with the theoretical means of R ; L ; Q from Theorem 2.1. Over the range of N we have compared, when k = 5 or 1, there is very little dierence between the analytic and empirical forms in general: the means of L and L + dier by no more than.6 in the former case and no more than.16 in the latter, for example. We see that R is even more stable and that the analytic forms for Q (which we recall were only approximate) are also accurate except in the relatively extreme case that n = 15 and k = 1. When k = the eect of truncating to ensure non-negative empirical estimators is also clear, and typically means are higher than the analytic forms, as we would expect. FIGURE 2 NEAR HERE The variances of the empirical estimators are also generally well approximated by the analytic forms in the cases where k = 5 or 1: details are in Taylor (1998). The exception is again when k = 1 for Q : here we seem to need n to be greater than around 35 for the approximations to work. When k = the eect of rounding all the negative values to zero is to diminish the variances, and this seems also to be the case for L when k = 5. In these cases the analytic forms are quite conservative in the sense of leading to excessive estimates of variance in all cases. 11

12 5 The iterative \Trim and Fill" Algorithm The results above show the behavior of our estimators when the value of is known. The more dicult situation that arises in meta-analysis is caused because is unknown, so that the number and position of any missing studies is correlated with the true value of. To handle this situation we develop an iterative algorithm. Using the estimators above we proceed as follows: Step 1: Estimate b (1) using the RE estimator. Construct the initial set of centered values Y (1) i = Y i? b (1) ; i = 1; : : :; n; and estimate b k (1) using one of the estimators above (say L + for example) applied to the set Y (1) i. Step 2: Relying on a symmetry argument, remove k (1) values from the right hand end of the set of values Y i, and estimate b (2) based on the trimmed \symmetric" set of n? k b(1) values, i.e. fy 1 ; : : :; Y (1) n? b k g. Construct the next set of centered values and estimate b k (2) from the set Y (2) i ; i = 1; : : :; n. Y (2) i = Y i? b (2) ; i = 1; : : :; n; Step 3: Remove k b(2) values from the right hand end; re-estimate b (3) based on the leftmost set of n? k b(2) values; construct Y (3) i = Y i? b (3) ; i = 1; : : :; n; estimate k b(3) from Y (3) i. Step : Continue until an iteration J where k b(j) = k b(j?1) := k b, at which point we also have b (J) = b (J?1). Step 5: As a nal step, \ll" the funnel plot by using the imputed symmetric values with associated imputed standard errors Y j = 2b (J)? Yn?j+1 ; j = 1; : : :; b k j = n?j+1 ; j = 1; : : :; b k : Estimate the nal value of using the full augmented data-set by b (F ) = P n Y 1 iw i + P b k Y 1 P n 1 w i + P b k 1 w i with imputed 95% CI given by h b (F ) 1:96? q Pn w 1 i + P ; b (F ) + b k 1 w i 12 i w i 1:96 q Pn 1 w i + P b k 1 w i (25) i ; (26)

13 here w i = [i 2 + F 2 ]?1 and w i = [( i )2 + F 2 ]?1, and F 2 is estimated from the entire lled data set fy 1 ; : : :; Y n ; Y 1 ; : : :; Yb g, using the DerSimonian-Laird estimator. k We make the following observations on this algorithm: (i) Note that nding b (J) in Step is only possible if b k n? 1, and in practice we truncate the algorithm if this value is reached. (ii) When we have a xed eect model, it is clear that the algorithm must converge, since the b (j) have to decrease by denition, and so k b(j) increases. In the real (non-simulated) examples in Taylor (1998) we found that convergence never took more than iterations. However if the data are tted by a random eects model it is possible that the b (j) may oscillate, and so k b(j) will also oscillate. This appears to be rare in simulations but consideration of the possibility is required. (iii) The nal estimate reects the variability that should be imputed in the missing studies: thus we are not only adjusting the mean but also the condence interval for these imputed values. In fact, one can check that in the FE case the sum in (25) only depends on the n? b k values in the trimmed data set, so the nal mean in this context does not depend on the location but only on the number of imputed missing values. In the RE case this is not so, since the imputed (lled) values may change the estimate of 2. 6 Two examples We now apply the method to two dierent examples, the simulated data set in Section 1.2, where we know the underlying structure, and a meta-analysis of 35 studies of the eect of environmental tobacco smoke (ETS) on lung cancer, studied in Givens et al (1998) using a relatively complex Bayesian approach. We compare the results obtained with those from three other methods: the estimate of the number of unreported studies calculated by the \simple" method of equation (2) of Gleser and Olkin (1996), which is comparable in calculation eort to our estimates of k ; and the tests for publication bias of Begg (199) and Egger et al (1997). Example: simulated data In the simulated data shown in Figure 1, we know ve studies have been suppressed, and that this changes a statistically insignicant result on the original data to a statistically signicant result if publication bias is not considered. On this data set the Gleser and Olkin (1996) method estimates that there are 11 missing studies. The two-sided p-values from the Begg (199) and Egger et al (1997) methods are respectively.1 and., indicating that the null hypothesis of no suppression is not supported. Note that Begg (199) points out that his test is not powerful and a value such as this should be taken as an indication of the possible existence of publication bias. 13

14 Table 1 shows the way in which the various estimators behave on this example. As is often found in examples, we have R + L+ Q+. There is relatively little dierence on the lled value of and its CI, no matter which estimator is used to ll. Figure 1(c) is lled using the actual number, i.e. using L +. Table 1: Eect of publication bias using estimators of k for the data in Figure 1 Estimator Imputed n Value of k (SE) No. iterations b 95% CI Original data (-.18,.178) Suppressed data (.37,.21) R + 3 (3.2) 2.92 (-.1,.185) L (3.7) 5.82 (-.11,.176) Q (.6) 6.79 (-.13,.17) Example: Lung cancer and environmental tobacco smoke (ETS) Givens et al (1997) use a Bayesian approach to estimate and adjust for publication bias in a set of 35 studies which assess the risk of lung cancer in non-smoking women exposed to spousal smoking. The data consist of relative risks and associated condence intervals, and have also been studied by Lee (1992) and Mengersen, Tweedie and Biggersta (1995). We nd that R + = L+ = 8, and Q+ = 9. In this case the two-sided p- values from the Begg (199) and Egger et al (1997) methods are respectively.16 and.5, also indicative of publication bias. In contrast, the method of Gleser and Olkin (1996) estimates that k = : this is because this method is very susceptible to just one study with a large p-value against the null hypothesis (in this case due to a Y i which is only slightly negative but has a very small variance). FIGURE 3 NEAR HERE After lling eight studies, in accord with R + or L+, the overall relative risk moves from 1.2 [1.8, 1.3] to 1.12 [1.1, 1.25], almost identical to the more complicated procedure of Givens et al (1997) and the more subjective analysis of Mengersen, Tweedie and Biggersta (1995). This is illustrated in Figure 3, where the original data are shown as solid symbols, with the \lled" data as open symbols. The upper panel uses the appropriate eect size in the funnel plot, whereas the lower panel gives the meta-analyzed estimates and CIs in the original scale. 7 Discussion A meta-analysis based on only a subset of all relevant studies may result in biased conclusions. It is a common belief, backed by several empirical assessments, that studies are not uniformly 1

15 likely to be published in scientic journals (Cooper, 1998, pp 5-55; Dickersin, Min and Meinert, 1992). Easterbrook et al. (1991) suggest that statistical signicance is a major determining factor of publication. Some researchers (e.g. students with Masters' or Ph.D. theses) may not submit a nonsignicant result for publication, and editors may not publish nonsignicant results even if they are submitted (British Medical Journal, 1983). Evaluating this eect is dicult, since the missing studies inuence the mean that is estimated in the meta-analysis. As DuMouchel and Harris note in the comments on Givens et al (1997), \attempts to assess publication bias beyond simple graphs like the funnel plot seem to involve a tour de force of modeling, and as such are bound to run up against resistance from those who are not statistical modeling wonks". The method we propose here avoids the need for such a cumbersome model by using a simple approach to calculating k (in terms of computational burden and assumptions), and by applying this iteratively as we trim and ll the data. The simulations carried out indicate that the non-parametric estimators we have derived for the number of missing studies work well, giving realistic estimates using either R + or L+ in particular. Comparisons of these estimators are given in more detail in Taylor (1998), and in the real examples examined there, the trim and ll method matches the subjective impression of bias given by funnel plots, and appears to give results consistent with those of Begg (199) or Egger et al (1997). Of course, we do not actually know that the lled studies match reality in any close way. But in the sense that we can see the potential for incorrect conclusions if publication bias does exist, the trim and ll method does seem to provide a robust diagnostic method to enable us to decide which meta-analyses are safe from this bias, and which must be treated with considerable caution. References Begg, C. B. (199). Publication bias. In The Handbook of Research Synthesis, Cooper, H. and Hedges, L. V., eds., 399{9. Russell Sage Foundation, New York. Begg, C. B. and Mazumdar, M. (199). Operating characteristics of a rank correlation test for publication bias. Biometrics, 5: 188{111. Biggersta, B. J., and Tweedie, R. L. (1997). Incorporating variability in estimates of heterogeneity in the random eects model in meta-analysis. Statistics in Medicine, 16: 753{768. British Medical Journal Editorial Sta (1983). The editor regrets... (editorial). British Medical Journal, 28:58. Casella, G. and Berger, R. L. (199). Statistical Inference. Duxbury Press, Belmont, California. 15

16 Cooper, H. (1998). Synthesizing Research. Sage Publications, Thousand Oaks. Cooper, H. and Hedges, L. V., eds. (199). The Handbook of Research Synthesis. Russell Sage Foundation, New York. Dear, K. and Begg, C. (1992). An approach for assessing publication bias prior to performing a meta-analysis. Statistical Science, 7:237{25. Dear, K. and Dobson, A. (1997). Comment on Givens, G. H., Smith, D. D., and Tweedie, R. L. (1997). Publication bias in meta-analysis: A Bayesian data-augmentation approach to account for issues exemplied in the passive smoking debate (with discussion). Statistical Science, 12:25{26. DerSimonian, R. and Laird, N. M. (1986). Meta-analysis in clinical trials. Trials, 7:177{188. Controlled Clinical Dickersin, K., Min, Y., and Meinert, C. (1992). Factors inuencing publication of research results. Journal of the American Medical Association, 267:37{378. DuMouchel, W. and Harris, J. (1997). Comment on Givens, G. H., Smith, D. D., and Tweedie, R. L. (1997). Publication bias in meta-analysis: A Bayesian data-augmentation approach to account for issues exemplied in the passive smoking debate (with discussion). Statistical Science, 12:2{25. Easterbrook, P., Berlin, J., Gopalan, R., and Matthews, D. (1991). Publication bias in clinical research. Lancet, 337:867{872. Egger, M., Smith, G. D., Schneider, M., and Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315:629{63. Galbraith, R. (1988). A note on the graphical presentation of estimated odds ratios from several clinical trials. Statistics in Medicine, 7:889{89. Givens, G. H., Smith, D. D., and Tweedie, R. L. (1997). Publication bias in meta-analysis: A Bayesian data-augmentation approach to account for issues exemplied in the passive smoking debate (with discussion). Statistical Science, 12:221{25. Gleser, L. J. and Olkin, I. (1996). Models for estimating the number of unpublished studies. Statistics in Medicine, 15:293{257. Hedges, L. (1992). Modeling publication selection eects in meta-analysis. Statistical Science, 7:227{236. Hedges, L. and Olkin, I. (1985). Statistical Methods for Meta-analysis. Academic Press, New York. Iyengar, S. and Greenhouse, J. B. (1988). Selection models and the le drawer problem (with discussion). Statistical Science, 3:19{135. Lee, P. N. (1992). Environmental Tobacco Smoke and Mortality. Karger, Basel. Light, R. and Pillemer, D. (198). Summing Up: the Science of Reviewing Research. Harvard University Press, Cambridge. Marascuilo, L. A. and McSweeney, M. (1977). Nonparametric and distribution-free methods for the social sciences. Brooks/Cole Publishing Co, Monterey. 16

17 Mengersen, K., Tweedie, R., and Biggersta, B. (1995). The impact of method choice in metaanalysis. Australian Journal of Statistics, 37:19{. Misakian, A. L. and Bero, L. A. (1998). Publication bias and research on passive smoking: Comparison of published and unpublished studies. Journal of the American Medical Association, 28:25{ 253. NRC Committee on Applied and Theoretical Statistics (1992). Combining Information: Statistical Issues and Opportunities for Research. National Academy Press, Washington. Olkin, I. (1992). Meta-analysis: methods for combining independent studies. Statistical Science, 7:226. Sugita, M., Kanamori, M., Izuno, T., and Miyakawa, M. (1992). Estimating a summarized odds ratio whilst eliminating publication bias in meta-analysis. Japanese Journal of Clinical Oncology, 22:35{358. Sugita, M., Yamaguchi, N., Izuno, T., Kanamori, M., and Kasuga, H. (199). Publication probability of a study on odds ratio value: Circumstantial evidence for publication bias in medical study areas. Tokai Journal of Experimental and Clinical Medicine, 19:29{37. Taylor, S. J. (1998). Eects of publication bias in meta-analysis. PhD dissertation, University of Colorado Health Sciences Center, Department of Preventive Medicine and Biometrics. Tweedie, R. L., Scott, D. S., Biggersta, B. J., and Mengersen, K. L. (1996). Bayesian metaanalysis, with application to studies of ETS and lung cancer. Lung Cancer, 1: S171{S19. 17

18 1 8 1/standard error Effect size Figure 1a. Funnel plot of 35 simulated studies: overall effect size shown is.8 with 95% CI of (-.18,.178) /standard error Effect size Figure 1b. Plot with five 'left-most' studies suppressed: overall effect size now is estimated as.12 with 95% CI of (.37,.21) /standard error Effect size Figure 1c. 'Filled' funnel plot: overall effect size is estimated using trim and fill as.82 with 95% CI of (-.11,.176).

19 15 R + Mean No of observed studies 15 L + Mean No of observed studies 15 Q + Mean No of observed studies Figure 2: Analytic means (filled symbols) and simulated means (open symbols) for each of the estimators shown, at values of k =,5,1 19

20 12 1 1/standard error Log relative risk Relative risk Figure 3: Top panel is a funnel plot of log relative risks of lung cancer associated with ETS; bottom panel shows overall relative risk and 95% CI before and after allowing for publication bias. 2

1 Introduction The second half of the twentieth century has witnessed an explosive growth in the scientic literature. For example, between 1940 and 19

1 Introduction The second half of the twentieth century has witnessed an explosive growth in the scientic literature. For example, between 1940 and 19 Modeling Publication Bias Using Weighted Distributions in a Bayesian Framework Daniel T. Larose and Dipak K. Dey* Abstract Meta-analysis refers to the quantitative synthesis of evidence from a set of related