400: A Method for Combining Non-Independent, One-Sided Tests of Significance Author(s): Morton B. Brown Reviewed work(s): Source: Biometrics, Vol. 31, No. 4 (Dec., 1975), pp. 987-992 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2529826. Accessed: 15/06/2012 12:36 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at. http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org
BIoM1ETRICS 31, 987-992 December 1975 400: A METHOD FOR COMBINING NON-INDEPENDENT, ONE-SIDED TESTS OF SIGNIFICANCE MORTON B. BROWN Departrment of Statistics, Tel-Aviv University, Tel-Aviv, Israel and Department of Biomizathematics, University of California, Los Angeles, California 9002.4, U. S. A.' SUMMARY Littell and Folks [.1971, 1973] show that Fisher's method of combining independenl tests of significance is asymptotically optimal among essentially all methods of combining independent tests. By assuming a joint multivariate normal density for the variables, we approximate the distribultion of Fisher's statistic in order to combiine one-sided tests of location when all the variables are not jointly inidependent. The probability associated with this test is simpler to evaluate than that of the equivalent likelihood ratio test. 1. INTRODUCTION One-sided tests of location arise in several applications of medical and biological data. At times it is desirable to evaluate the combined probability obtained by several tests on non-independent variables. An example of this is in the testing for adulteration of fruit juice when several components in the juice are simultaneously assayed and compared with their null distributions (Lifshitz et al. [1971]). By assuming a joint multivariate normal distribution for the variables, a likelihood ratio test can be obtained for this problem (Nuesch [1966]). However, it requires multidimensional integration to calculate the joint probability associated with any sample outcome. For this reason we are interested in finding an approximate test statistic whose probability is simpler to estimate. When the separate tests of significance for the variables are independent, Fisher's method of combining the probabilities is asymptotically optimal among essentially all methods of combining independent tests (Littell and Folks [1971, 1973]). The method is to sum k X _ (-2 loge pi) i =1 where pi is the probability that the ith variable exceeds the observed value under the null hypothesis (where the direction is chosen according to the alternate hypothesis). Under the null hypothesis X2 is distributed as a chi-square variate with 2k degrees of freedom D.F. where k is the number of independen tests performed. Our aim is to indicate the modification required to approximate the null distribution of X2 when the variables, and therefore the tests, are not jointly independent. We assume a joint multivariate Gaussian density for the variables. Under the null distribution, the covariance between -2 loge pi and -2 loge pi is a function only of the correlation between the ith and jth variables. Using numerical integration we evaluate and tabulate the covariance. Therefore, given an arbitrary correlation matrix, the first two moments of x2 I Current address 987
988 BIOMETRICS, DECEMBER 1975 can be easily calculated. We equate these moments to those of a chi-square and derive an approximate distribution for X2 which is adequate for a wide range of correlations. 2. APPROXIMATING THE DISTRIBUTION OF X2 Let (xi, x, X) have a joint multivariate normal distribution N(,u, $) with mean vector, = (,Al, Ak) and known covariance matrix $ = (ij). We are interested in testing the following set of hypotheses: Ho0: Ai A=juo i =1,, k H1i,: Ai i = 1,..A, k with at least one strict inequality. The null hypothesis Ho involves k separate, but not necessarily independent, tests of significance. If a test of significance is performed separately for each variable, the descriptive level of significance of each test is Pi =',[(Xi - Aio)/i], oi =?ii where T1 represents the standard Gaussian cdf. (If the inequality sign in H1 is reversed, then the descriptive level of significance is 1 - pi.) Under the null hypothesis each -2 loge pi is distributed as a chi-square variate with 2 D.F. Define k x2= -2 loge pi (1) When the variables are jointly independent and the null hypothesis is satisfied, X2 is distributed as a chi-square with 2k D.F. If the variables are not jointly independent, X2 has mean and variance E(X2) = 2k (2) 02 (X2) = cov (-2 loge Pi, -2 loge pi) i j = E var (-2 loge pi) + 2 E E cov (-2 loge P, -2 loge pi) i i<i = 4k + 2 E E cov (-2 loge Pi, -2 loge pi) (3) i<i where cov (x, y) represents the covariance between x and y. The covariance between -2 log, pi and -2 log, p is a function only of the correlation between the ith and jth variables. This is readily shown since pi and p3 are invariant under the group of affine transformations. Therefore the joint density of -2 loge pi and -2 loge pi can be a function only of the correlation between the two variables. These covariances are evaluated by Gaussian quadrature (Krylov [1962]) and listed in Table 1. The entries in Table 1 can be approximated over a wide range by two quadratic funietions of the correlation p. cov (-2 loge pi, -2 loge pi) = -( + O 0 p < 1 (4) lp(3.27 + 0.71p) -0.5 < p < 0
COMBINING TESTS OF SIGNIFICANCE 989 The above formulae are empirically derived and differ from the entries in Table 1 by no more than 0.002. Assume that the distribution of X2 can be approximated by that of cxf where X'f is a chi-square variate with f D.F. (If would be more correct to call it a gamma variable with two parameters. However, we are more familiar with the use of a chi-square table.) Equating the firs two moments of X2 and CX!f yield Therefore E(X2) = cf and o2(x2) = 2c2f. f = 2{E (X2) }/ 2 (X) (5) and c = 2 (X 2)/{2E(X2)}. (6) Since E(X2) and _2(X2) may be calculated by equations (2) and (3), both f and c are easily evaluated. As f increases, c decreases. For this reason the variation in the critical value of X2 from that of the chi-square with 2k D.F. is less than might be expected. The result of numerical integration over the critical region of X2 as determined by the above approximation is compared with the nominal size of the test in Table 2. The nominal and actual sizes agree quite well for all the positive correlation shown in the table. There is some divergence between the actual and the nominal sizes for large negative correlations except at the size 0.05 where there is high agreement between the two sizes. TABLE 1 Cov (-2 loge pi -2 loge pj) WHEN Pi AND pj ARE PROBABILITIES CORRESPONDING TO ONE-SIDED TESTS OF THE UNIVARIATE GAUSSIAN DISTRIBUTION ipijl Plj > 0 pij < 0 0.0 0.000 0.000 0.1 0.334-0.320 0.2 0.681-0.625 0.3 1.044-0.916 0.4 1.421-1.194 0.5 1.812-1.458 0.6 2.219-1.709 0.7 2.641-1.946 0.8 3.079-2.170 0.9 3.531-2.382 1.0 4.000 undefined
990 BIOMETRICS, DECEMBER 1975 TABLE 2 SIZES (IN %) OF X2 FOR SEVERAL NOMINAL SIZES Two variables p 20% 5% 1% 0.0 20.00 5.00* 1.00 0.1 19.98 4.99 1.00 0.3 19.95 4.99 1.00 0.5 19.95 5.00 1.01 0.7 19.95 5.01 1.01-0.1 20.00 5.02 1.01-0.3 19.76 4.94 1.02-0.5 19.30 4.97 1.15-0.7 18.10 5.00 1.41 Three variables P12 P13qP23 0.0 20.00 5.00 1.00 0.1 19.99 5.00 1.00 0.3 19.93 4.99 1.01 0.5 19.88 4.99 1.02 0.7 19.91 5.01 1.02-0.1 19.95 5.01 1.01-0.3 19.32 5.06 1.19 P1 2 31 P23 0.5 18.38 4.97 1.34 0.7 16.66 5.09 1.67 3. AN EXAMPLE Lifshitz et al. [1971] compared the empirical power of X2 with an alternative procedure in which the regression equation relating four highly correlated ingredients is estimated and then samples under suspicion are rejected if their residual from the regression equation is too extreme. As their data base, 58 samples of pure lemon juice were analyzed. The correlation matrix relating the four amino acids is: ASP GLU GLY ALA Aspartic acid (ASP) 1.00 Glutamic acid (GLU) 0.74 1.00 Glycine (GLY) 0.51 0.19 1.00 Alanine (ALA) 0.21-0.07 0.35 1.00 The approximate distribution for X2 based upon the four amino acids (k = 4) is evaluated as follows: The expected value of X2, E(X2), is found from (2). The variance of X2 is obtained from (3). E(X2) = 2k = 2(4) = 8 02(X2) = 4k + 2 E i<i cov (-2 loge Pi, -2 loge pi) = 4(4) + 2(2.816 + 1.853 + 0.681 + 0.645-0.225 + 1.229) = 27.99 where the covariances (2.816,, 1.229) are obtained from (4) according to whether
COMBINING TESTS OF SIGNIFICANCE 991 the correlations (0.74,, 0.35) are positive or negative. Therefore, using (5) and (6), and f = 2 {E(X2)}2/o2(X2) = 4.6 c = o2(x2)/ {2E(X2)} = 0.87. Lifshitz and Stepak [1971] give the following means and standard deviations for the four amino acids: ASP GLU GLY ALA mean 4.91 2.09 0.23 2.55 standard deviation 0.95 0.30 0.04 0.38 based on 14 samples. Suppose that in a new sample of possibly adulterated juice, each ingredient is assayed with the results: ASP = 3.50, GLU = 1.60, GL'Y = 0.20 and ALA = 1.90. Then a t-statistic is computed for each amino acid separately (t = - 1.43, - 1.58, -0.72, and - 1.65, respectively). The one-sided descriptive levels of significance of these t-statistics (which test,ii =,iio againstgi <,io) with 13 df are approximately 0.09, 0.08, 0.24 and 0.06 respectively. Therefore 4 X = -2 loge pi = 4.81 + 5.05 + 2.85 + 5.63 = 18.34. i =1 To test the hypothesis that ino adulteration has been performed, X2 is divided by 0.87 (18.34/0.87 = 21.08) and compared with the critical values of the chi-square statistic with 4.6 D.F. We would reject the hypothesis of no adulteration even at a level of significance as extreme as 0.005. ACKNOWLEDGMENT This research was supported in part by NIH grant RR-3. UNE Mf1THODE POUR COMBINER DES TESTS UNILATFPRES DEl SIGNIFICATION NON INDEPENDANTS RESUME Littell et Folks (1971, 1973) montrent que la methode de Fisher de combitiaisou de tests de signification independaints est asymnptotiqueement optimale parmi toutes celles qui combiiiett des tests inidependants. En supposant uine densite liee multivariate gaussienine pour les variables, on approche la distribuition de cette statistique pouiil combiner des tests unilat6res de position quaand toutes le* variables iie sont pas indeper1dai1tes. OII evalue pluis simplement la probabilite associee a ce test que celle dii test equivalent du rapport de vraisemblanice. REFERENCES Krylov, V. I. [1962]. Approximate Calculation of Integrals. McMillan, New York. Lifshitz, A. and Stepak, Y. [1971]. Detection of aduilteration of fruit juice, I. Characterization of Israel lemon juice. Journal of the Association of Official Analytical Chemists 54, 1262-5. Lifshitz, A., Stepak, Y., aiid Biowrn, M. B. [1971]. Detection of adulteration of fruit jtuice, II. Comparison of stat-istical methods. Journal of the Association of Official Analytical Chemists 54, 1266--9.
992 BIOMETRICS, DECEMBER 1975 Littell, R. C. and Folks, J. L. [1971]. Asymptotic optimality of Fisher's method of combining independent tests. J. Amer. Statist. Ass. 66, 802-6. Littell, R. C. and Folks, J. L. [1973]. Asymptotic optimality of Fisher's method of combining independent tests. II. J. Amer. Statist. Ass. 68, 193-4. Nuesch, P. E. [1966]. On the problem of testing location in multivariate populations for restricted alternatives. Ann. Math. Statist. 37, 113-9. Received May 1973, Revised June 1974 Key Words: Combining tests of significance; Non-independent tests; Test for adulteration; Fisher's method of combining probabilities.