The effect of nonzero second-order interaction on combined estimators of the odds ratio

Size: px
Start display at page:

Download "The effect of nonzero second-order interaction on combined estimators of the odds ratio"

Transcription

1 Biometrika (1978), 65, 1, pp Printed in Great Britain The effect of nonzero second-order interaction on combined estimators of the odds ratio BY SONJA M. MCKINLAY Department of Mathematics, Boston University, Massachusetts STTMMABY In combining odds ratios across strata, the assumption of zero second-order interaction is usually made in order to simplify estimation. The implications for the use of the general logistic model, when this assumption is not met, are considered. Using Monte Carlo techniques, the effect of nonzero second-order interaction on five available combined estimators is considered and an attempt made to provide results applicable in a variety of research situations. Some key words: Contingency table; Logistic model; Monte Carlo method; Odds ratio; Relative risk; Stratified sample. 1. intbodttcmon The odds ratio is one of the earliest measures of association in x contingency tables (Yule, 191; Goodman & Kruskal, 1959). However, it was not until the 1950's that techniques for combining estimates from several x tables were proposed. One of the earliest combined measures (Woolf, 1955) consisted of a sum of the logarithms of the odds ratios, weighted by the inverse of their variances. But, as Woolf points out, this combined measure is only valid in the absence of second-order interaction, equivalent to absence of 'heterogeneity' in his paper, and a suitable test for this heterogeneity is provided. The combined measure, proposed by Biroh (1964), modified by Goodman (1969), and the conditional maximum likelihood estimator itself (Gart, 1971) have all assumed a constant odds ratio. Possibly the only exception has been the measure given by Mantel & Haenszel (1959). These authors state (p. 735) that "... the assumption of a constant relative risk can be discarded as usually untenable'. Indeed, researchers frequently want a summary measure of association from several fourfold tables even though there is heterogeneity among the individual odds ratios. Certainly, in order to combine tables, the second-order interaction should never be so marked that there are important reversals in direction, some tables having odds ratios less than and others greater than unity. This was noted by Birch (1964, p. 30). Heterogeneity between tables generally results from the differential effect of one or more factors or covariates, extraneous to the investigation, and it is often hoped that by combining estimates, the effect of these covariates is reduced. Indeed, stratification is sometimes employed with the specific purpose of reducing covariate effects, in order to estimate the constant component of an association. Two reasons for combining estimates have been mentioned here: (a) to provide a summary statistic, and (b) to remove the effects of specific covariates from the comparison. In fact, it is difficult to differentiate between (a) and (b) except, perhaps, by sampling design. In general, when samples sizes are fixed within predetermined strata, summary estimates are required. In contrast, when the strata are formed during the analysis, from fixed total samples, elimination of covariate effects is usually the motivation.

2 19 SONJA M. MCKINIAY In this paper, the use of a general logistic model is considered, with reference to the combination of odds ratios from K strata. The estimators to be used in the Monte Carlo study are then described and the related simulation methods are presented. The final section discusses and interprets findings of the Monte Carlo investigation for application in research.. THE GENERAL MODEL Consider a binary response variable Y if with parameter p if, for the tth individual in the jth population. Assume, for simplicity that the covariate X is univariate, and assume further that (^) -«,+/»,*«,. (-1) Now, for the jth population containing N t individuals, we have a vector of independent binary response variables {Y tj } for i = 1 N }, with corresponding vectors of probabilities {Pa) an< i covariables {X.,}. Assume that the p if are independently distributed as in (-1) and that the X ii are independently and identically distributed within the jth population, so that one can write (X. y ) = E(X j ) for all i. Then, taking the expectation of the logits with respect to the X,, we have say, and the odds ratio between the two populations is defined as xft = e Qi ~ Qi, or alternatively log^ = K-oJ + ft^xj-ft^x,). (-) Now define K subpopulations or strata on the distributions of X l and X. In general, the»th individual from population 1, and the hth individual from population will be included in the fcth stratum if, for an ordered sequence of constants {a k } (k = 1,...,K), Ofc-i< «. x k% < fc- Then we have, for the jfcth stratum, where u) k = {x ij : a k _ 1 <x ii <a k } (t = 1,...,N t ; j = 1,). (-3) If the constant term say, then it is clear that in general log 0 4= log 0 A =t= log 0'. Indeed, both log</«and log^r fc consist of the sum of the common element log^r' and a variable or bias component dependent on the distribution of the covariate. In order either to provide a summary statistic or to reduce the effect of the covariate, we must reduce the size of this variable component in relation to logi/r'. For stratification to be effective in reducing the bias, the following relationship should hold for almost all k: Define L* = G^log^,...log^) so that L* represents an average of the log^r fc for a suitably chosen function (?(.). Then we expect that one of two conditions is satisfied as K is increased, where L It is also clear from the presentation of the model that log^ = log^' only when either Pi = 0i = 0 or & = /3 a #= 0 and the central moments of X x and X t are identical. The first situation may be approximated if the correlation between X, and Y f is small and a summary (-4)

3 Combined estimators of the odds ratio 193 statistic is desired. In this case log^r t =^=log0' also. If, however, the aim is to reduce the effect of Xj, then the correlation, and hence the values of y8 x and /3, will clearly be nonzero and relatively large numerically. In the second case, identical covariate means can generally only be assumed in an experimental situation when two treatments are randomly assigned to subjects within one homogeneous population, and not in observational studies involving two distinct populations. Moreover, a model in which /Jj = j9 and E(X 1 ) = E(X i ) implies a zero second-order interaction, which may pertain in an experimental situation, but seldom, if ever, for observational data. The discussion thus far has focused on the log of the odds ratio rather than the odds ratio itself. The reasons for this are two-fold. First, log ip is a linear function of X and, therefore, direct estimates of this value from unstratified samples will be unbiased. If we assume, for simplicity, that j3 x = /? = /? in (-3) and define T = 6, where 6 = log^, then where E(b 0 ) = (a^-aj), E^) = fi and e r ~#(O,CTf,)- Clearly E(T) = 6. However, it can be easily demonstrated that e T is a biased estimator of ip. Given that we are interested in estimating the constant component, equivalent to ip', it seems preferable to consider estimators in a scale which produces at least one unbiased estimator, T. Secondly, the main part of this investigation involves the comparison of Monte Carlo values. Provided the number of samples generated is sufficiently large, these values should be good approximations to the corresponding expected values of the estimators. However, the assumption is implicitly made that the distribution of an estimator is approximately normal so that the Monte Carlo mean is an unbiased estimate of the expected value. Now, the distributions of the estimators of the odds ratio tend to be noticeably positively skew unless the samples are very large, while the distributions on the log scale are approximately normal, even for moderate samples. To use estimators of the odds ratio itself in a Monte Carlo study would, therefore, distort the estimates further by introducing an unknown distortion due to the skewness of the distribution. To study the relationship between logi/r', log tp and L*, values of log^t were first computed using (-3) with /? x = /5 = /?, and X j ~N(jiprf). To provide a wide range of situations, two values were used for each of log tjj', /} and (ji v of; /j. t, of), yielding eight different versions of the model; see Table 1 for these values. Within each of these, four values of K were considered (1,,5,10). Of course, for K = 1, log^r fc = log0, which is the parameter for unstratified samples. To demonstrate the positive bias of the odds ratio mentioned above, values of ip k were also calculated, using the exponent of (-3) for the ith individual in population 1 and the /tth in population and integrating over the range of values for each stratum. The strata were denned with the Monte Carlo investigation in mind. For K = 10, a k~ a t-i = 3*0 in all strata with the exception that a 1 a 0 = a 10 a i = co. The central value, a s, was set at 4, given the values of ^ and /i chosen for the investigation. To form the strata for K = and 5, adjacent strata were then combined. An unweighted arithmetic mean of the \ogifi k was used as a value for L*. The use of equal stratum weights has been investigated by Cox (1957) and found to be nearly as efficient as optimal weighting for a normally distributed variate, so that this simple function would appear to be a reasonable choice. However, the fact that equal weight is given to the extreme strata, with infinite boundaries, reduces the effectiveness of this function, so that the L* as defined in this paper cannot be considered optimal. Similarly, ip* was defined as the unweighted geometric mean of the ip k. The values of log ip', log tp and i* as well &sip',<p and <p* are given in Table 1 for each of the eight situations defined above. For equal covariate means, (i x = fi % = 4, and given that

4 194 SONJA M. MCKINLAY Pi = Pi> no second-order interaction is present, so that a priori we expect log^ = log^'. Moreover, because the strata are symmetrical about the covariate means, and because L* is a simple average, L* log^r' also. For unequal covariate means, there is some second-order interaction, especially for the higher correlation, /} = 0-10, because the value of differs considerably between strata. Table 1. Expected values of the odds ratio and the log odds ratio for unstratified and stratified samples, compared toith the constant component, if/' or log^r', for different sets of covariate distributions and values of /? Number of strata Log odds ratio logf log ^4 L* L* L* Odds ratio \lf V* Covariate distribution, (/x lt o\; fh> l) (4, 64; 4, 36) (8, 64; 0, 36) P = A The relationship (-4) holds consistently for all the situations considered in Table 1. Moreover, for unequal covariate means and across all situations, for the odds ratios themselves, the combined value approaches the constant term, L*-*logifi', or \fi*-*-i]t', as K is increased. In other words, increasing the number of strata decreases the bias. The Monte Carlo study described in the following sections compares the approximate expected values of the logarithms of five well-known combined estimators of the odds ratio with the values of L* and log ifi' given in Table 1 in order to assess their usefulness as summary measures. 3. THE ESTIMATORS Several approximate statistics have been derived for combining the within-stratum estimates, assuming the odds ratio is constant across strata, ifj k = ip', for all k. Four of these are relatively well known (Woolf, 1955; Mantel & Haenszel, 1959; Birch, 1964; Goodman, 1969) and have been the subject of recent investigations by Gart (196, 1970, 1971). Fleiss (1973, Chapter 10) has provided a good general description. Woolf's estimator, with modifications by Gart & Zweifel (1967), and those of Mantel & Haenszel, Birch and Goodman are given below, with the simple estimator for unstratified samples, in the notation used by Birch and others. Let n iik represent the ith response (t = 1,) in the jth population {j = 1,) and the &th stratum (k = 1 K). Summation is represented by a period in the usual way. The unstratified estimator of log <ji, from two independent samples n #1, and n A is simply

5 Combined estimators of the odds ratio 195 A constant of $, added to each of the four frequencies, has been suggested by Haldane (1966) to reduce the bias in estimating log^r and remove the possibility of an infinite value. This correction was not used here because the overall frequencies were never zero, and, given the generally large samples, the addition of this constant would have had a negligible effect on the estimate or its variance. However, for smaller samples the correction is certainly advised for both the log odds ratio and its variance. Woolf (1955) combined the logarithms of the observed within-stratum odds ratios, using as weights the inverses of the asymptotic variances. Because the within-stratum variances are a simple function of cell frequencies, this estimator is equivalent to a weighted arithmetic mean of the within-stratum odds ratios. The inclusion of the constant,, in both L k and w k provides the least biased estimates of both the difference in logits and its variance as shown by Gart & Zeifel (1967): L v = p k L k where Mantel & Haenszel (1959) used an indirectly weighted sum of the odds ratios themselves, combining the numerators and denominators separately. As noted in their paper, these weights are of the same order as the appropriate weights for a difference constant on the logit scale (Cochran, 1954; Radhakrishna, 1965). In order to provide a comparable estimator, the logarithm of this statistic was used in this investigation: 4»A = log {S^u TWnjtVS^nmnWn.*)}. More recently Birch (1964) proposed an approximation to the maximum likelihood estimator, valid if ip = ifi'^ 1, using the first two terms in the expansion o E(n llk \ip'), a method proposed earlier by Cox (1958), considered valid under the hypothesis of independence: Goodman (1969) subsequently modified L b for use when ifi = ifi' 4= 1, through the inclusion of further terms in Cox's expansion and obtained L a = {(G + 3M) L b - HOL\ + H}/(0 + BM- H*), where The conditional maximum likelihood estimator, suggested by Gart (1971), requires an iterative solution, and a computer routine has been developed (Thomas, 1975) which calculates both exact and asymptotic values. The conditional estimation of the constant difference uses a considerably simplified form of the likelihood, assuming that the second-order interaction is zero, ip k = ip'. The standard error of the estimate of the odds ratio should be smaller than its unconditional counterpart. As a further point of interest in this investigation, the first approximation used in the iteration is ^;ma. The logarithm of the asymptotic version, tyani in Gart's notation, will be used in this paper because it is equivalent in most instances to the exact estimator to at least the second decimal place. The notation L M will be used in this paper for the logarithm of the asymptotic maximum likelihood estimator.

6 196 SONJA M. MCKIPTLAY 4. MONTE CABLO COMPABISON The model defined in (-1) and (-3), with parameter values given in Table 1, was used for a Monte Carlo study to compare the effectiveness of the five estimators outlined in the previous section, relative both to each other and to the corresponding expected values summarized in Table 1. Parameter values, stratum sizes and numbers of strata remained as for Table 1. The values were so chosen that, besides generating situations similar to those encountered in practice, the samples almost never involved zero marginal totals in the stratum tables. This ensured comparable sampling situations for all the estimators. Sample sizes were determined from the results of a preliminary investigation using the odds ratios, in which it was found that the distributions of the estimators were affected by both relative and absolute sample sizes (McKinlay, 1975). The following four pairs (n L > n.t) f r total sample sizes were chosen; (n^,n_ a ) = (1,1); (1,5); (5,1); (5,5). The second pair (1,5) represents the most inefficient combination, with the smaller sample selected from the population with the larger covariate variance. The sampling schemes were chosen to approximate most closely the practical situations as outlined in the introduction, (a) the combining of estimates from strata of size fixed during the design and (b) the combining of estimates from strata, determined at the analysis, in order to reduce the effect of covariables. For scheme (a), the stratum samples (w^-.n^) are considered fixed for all k and contribute nothing to the variability of the estimator. The second scheme allows for variable stratum samples depending on the distribution of the covariate in the original sample. This introduces a variation component which may be considerable for those estimators which use stratum samples primarily as weights. Cochran (1954) and Radhakrishna (1965), for example, have noted that an assumption of constancy of the difference between proportions on the logistic scale implies the use of weights dependent on stratum sample sizes only. The second sampling design was most easily adapted to Monte Carlo methods, and it was found that the general distribution of units across strata remained comparable among samples generated, although actual numbers within strata varied. The first scheme required prior specification of the n jk. A simple, but somewhat extreme design was chosen, with n ik = 0-ln f, for all k (K = 10) and samples generated separately within strata. This meant a slight change in stratum boundaries to increase the probability of obtaining values in the extreme strata, a modification which had a negligible effect on the expected values L* while reducing computing time to practicable limits. The number of samples generated for each scheme was, fixed stratum size, and 10, variable stratum size. These numbers were determined by both the computing time required and the reproducibility of estimates. Preliminary simulations using these numbers produced estimates which did not differ from each other by more than 0-04, for n a = 1, and 0-01, for n L = 5, with comparable stability for the root mean squared error, indicating reasonable stability of the Monte Carlo results. 5. RESULTS Before considering results for the two sampling schemes, some general remarks are in order concerning the Monte Carlo values obtained. A comparison of the combined estimators yields the following relationships, consistent through all the results. First, the logarithm of the maximum likelihood estimator, L^, is in almost all instances equivalent to, or more biased and less precise than, the logarithm of Mantel & Haenszel's statistic, used as its initial approximation; the same relationship holds

7 Combined estimators of the odds ratio 197 for the odds ratios themselves. This means that, for the models considered here, even in the absence of second-order interaction, fi nil is never preferable as an estimator of tft', the simply computed ^r mh being always at least as good, in terms of both bias and precision. Secondly, the two approximations L b and L g exhibit remarkably similar trends, with few exceptions, even for log \\>' + 0. In those few instances where these are the least biased estimators, therefore, the extra computation involved in Goodman's modification is seldom justifiable. Thirdly, Woolf 's estimator, while usually one of the most precise, shows the greatest sensitivity to sample size, and marked increases in bias for larger K. Increasing the number of strata also produces some consistent trends among the estimators. As expected from prior work (Cox, 1957; Billewicz, 1965; Cochran, 1968; McKinlay, 1975), K = 5 results in near minimal residual bias, the gains in using more strata being negligible or even negative. In other words, in many situations, increasing the number of strata merely reduces the variation, while leaving the bias unchanged or even increasing it. The first sampling scheme, fixed within-stratum samples, produced generally precise estimates with negligible bias for equal covariate means, representing no second-order interaction. This was anticipated given the expected values in Table 1 and the use of fixed, equal, stratum weights in the form of the constant within-stratum samples. For unequal covariate means, the biases of the various estimators followed consistently the same pattern as for the second sampling scheme, to be discussed below. The only difference was that in all cases the biases were smaller, relative to log^r'. None of the results for this sampling scheme will, therefore, be considered in detail here. The second sampling scheme, fixed total samples only, is, perhaps, of more interest and wider applicability with considerable variation in within-stratum sample sizes, both between and within the samples generated. Table presents a summary of results for equal covariate means, equivalent to no secondorder interaction, and for j3 = 0-10 only. The results for = 0-05 were equivalent, with smaller biases and somewhat greater precision, and are therefore not presented in detail. For log ip' = 0, any bias, both in L o and in the combined estimators was generally negligible, with the outstanding exception of L w which showed large biases for unequal sample sizes. The dependence of this estimator solely on n afc and n ^ as stratum weights, combined with the use of the constant,, dampened the positive bias of the uncorrected odds ratio too severely in the inefficient case, samples of (1,5), and not enough in the converse situation, sample of (5,1). For a moderately large odds ratio, log^' = or rp' = 4-05, L w again exhibited marked instability, with a large negative bias, increasing with K for n^ = 1. The negative bias remained but was much less marked for n L = 5. Even though the odds ratio was not near unity, L b and L g had identical means and standard errors for equal samples, n A n A. For samples (1,5), both estimators had negative biases, particularly L b, while for samples (5,1), L b showed an equally large positive bias, and the negative bias of L g increased. In other words, both of these estimators exhibited considerable instability and biases at least as marked as those of L w in some situations. The only estimators which appeared to be almost unbiased and stable for different samples as well as increasing strata, were L mh and L^. Moreover, as noted above, because the difference between them appeared to be negligible, both in bias and in standard error, L ma would appear to be the preferred estimator in most situations in which negligible second-order interaction can be assumed. In Table 3 corresponding results are summarized for different covariate means, which implies that second-order interaction is present. This was most clearly detected for j3 = 0-10,

8 198 SONJA M. McKlNLAY Table. Means and standard errors from 10 samples for the simple log odds ratio and five combined estimators, for equal covariate means and fixed total samples, sample sizes (n^n % ), K strata, [(^.of; ^o») «(4,64; 4,36),j3 = 0-01] Expected value Samples Estimator 1, 1 L o L w L b Zi g 1,6 5,1 5,5 Expected 1,1 1,5 5, 1 6,5 A«J A> L w Jjw, & L o L w L g L mjt A* A) L m L b h t Am» Art value A) L m A. A»J Ai L m L b L mh A.i Ai A. A A.» A A A. A«* = 1 (0-60) (0-38) (0-43) (0-3) 1-4 (0-44) 1-6 (0-38) 1- (0-3) 1-3 (0-4) K = = log^r Oil -Oil -Oil -Oil (0-17) (0-48) (0-48) (0-51) (0-51) (0-39) (0-39) (0-38) (0-39) (0-39) (0-36) (0-38) (0-40) (0-44) (0-44) (0-3) (0-3) (0-3) (0-3) (0-4) log./.' = 119 (0-44) 118 (0-40) 118 (0-40) 1-18 (0-43) 1-30 (0-43) 118 (0-39) 0-99 (0-45) 113 (0-36) 1-31 (0-37) 1-30(0-37) [-9 (0-9) 1-59 (0-44) 1-0 (0-40) 7 (0-31) 8 (0-31) 1-6 (0-) 1-0 (0-5) 1-0 (0-5) 1-9 (0-1) ] 8 (0-1) = K == 6-1 (0-38) - (0-48) -0-0 (0-48) - (0-51) - (0-51) -0-4 (0-39) (0-38) -8 (0-37) -0-0 (0-39) -0-0 (0-39) -0-1 (0-38) (0-39) -3 (0-40) (0-4) (0-4) -0-0 (0-0) (0-) -3 (0-) -3 (0-) (0-) 1-17 (0-4) 1-3 (0-36) 1-5 (0-36) 1-39 (0-44) 1 (0-45) (0-4) (0-40) (0-3) (0-38) (0-37) 1-38 (0-7) 1-68 (0-48) 107 (0-35) 1-36 (0-9) 1-37 (0-9) 1-3 (0-19) 1-5 (0-1) 1-6 (0-0) 1-37 (0-19) 1-38 (0-19) K = (0-46) (0-38) (0-38) 0- (0-39) 0 (0-40) (0-3) (0-49) (0-49) (0-6) (0-54) (0-48) (0-39) (0-40) (0-4) (0-43) (0-19) (0-) (0-) (0-) (0-) 1-05 (0-47) 1-5 (0-37) 1-6 (0-37) (0-47) 8 (0-49) 0-9 (0-55) 107 (0-39) 1- (0-31) (0-39) 3 (0-39) 1 (0-5) 1-68 (0-49) 1-08 (0-34) 1-38 (0-30) 0 (0-30) 1-9 (0-0) 1-6 (0-0) 1-8 (0-19) 1-39 (0-19) 0 (0-19)

9 Combined estimators of the odds ratio 199 Table 3. Means and standard errors from 10 samples for the simple log odds ratio and five combined estimators, for unequal covariate means and fixed total samples, sample sizes (n±,n±), K strata, [(^.of; ^,a\) = (8,64; 0,30),/3 = 0-01] Expected value Samples Estimator 1, 1 L o 1,5 5, 1 5,5 Art Art A, A, Art Art A, Expeoted value 1, 1 A, A> 1, 5 5, 1 5,6 L b A, Ai* A. A, Art K = (0-89) 0-7 (0-85) 0-64 (0-7) 0-68 (0-71) (0-79) -01 (0-75) 1-99 (0-64) - (0-63) K = (0-55) 0-3 (0-57) 0-4 (0-59) 0-6 (0-63) 0-6 (0-6) 0-14 (0-41) 0- (0-44) 0- (0-48) 0-8 (0-54) 0-8 (0-54) 0-4 (0-4) 0-17 (0-41) 0-17 (0-39) 0-15 (0-40) 0-16 (0-40) 0-1 (0-31) 0-0 (0-31) 0-0 (0-31) 0-1 (0-31) 0-1 (0-3) logf (0-43) 3 (0-35) 4 (0-35) 1-6 (0-65) 1-64 (0-55) 7 (0-39) 1- (0-8) 1-35 (0-6) 63 (0-50) 64 (0-50) 57 (0-3) 76 (0-49) 36 (0-19) 54 (0-3) 1-56 (0-3) 1-58 (0-7) 6 (016) 6 (0-16) 1-59 (0-8) 1-60 (0-9) = (0-43) 0-07 (0-54) 0-07 (0-65) 0-10 (0-58) 0-09 (0-58) -0-14(0-38) 0-03 (0-43) 0-03 (0-44) 0-08 (0-48) 0-08 (0-48) 0-4 (0-39) 0-0 (0-36) 0-0 (0-36) 0- (0-37) 0- (0-37) 0-06 (0-3) 0-04 (0-4) 0-04 (0-4) 0-04 (0-4) 0-04 (0-5) (0-40) 1-9 (0-37) 1-31 (0-37) 8 (0-6) 1-51 (0-5) 1-16(0-41) 1-1 (0-37) 1-5 (0-33) 7 (0-46) 8 (0-46) 6 (0-7) 1-68 (0-38) 1-8 (0-4) (0-9) 4 (0-9) 1 (0-0) 1-30(0-18) 1-33 (018) 5 (0-) 6 (0-) K = (0-38) 0-04 (0-55) 0-04 (0-55) 0-07 (0-59) 0-07 (0-60) -0-6(0-40) 0- (0-44) 0- (0-44) 0-05 (0-48) 0-06 (0-48) 0-40 (0-48) (0-36) (0-36) -0-03(0-37) -0-03(0-38) 0-04 (0-) 1 (0-4) 0-01 (0-4) 0-0 (0-4) 0-0 (0-5) (0-41) 1-7 (0-38) 1-9 (0-38) 6 (0-53) 1-5 (0-65) 0-97 (0-51) 1-10 (0-38) 1-3 (0-34) 4 (0-47) 6 (0-47) 6 (0-5) 1-65 (0-36) 1-7 (0-5) 1 (0-9) 4 (0-30) 1-34 (0-0) 1-8 (0-0) 1-31 (0-19) (0-) 4 (0-)

10 SONJA M. MCKINLAY but, as for ^ /xj = 4, in Table, the results for js = 5 were essentially equivalent and are omitted here. The first point to note is that the bias of the unstratified estimator was of the same order as its standard error, at least 80% of the standard error in all cases. Moreover, even the most inefficient estimators removed at least 80% of this large initial bias in absolute terms for K = 5, with the exception of L w for samples (1,5). For \ogifi' = 0, the results were equivalent to those already discussed for Table. Only L w retained a large bias which increased with K. Of the remaining four estimators, L b and L g performed marginally better than L,^ and L^, although the differences in both bias and precision are so small that these four could be considered equivalent. For iogip' =, the estimators again showed trends similar to those evident in Table. The only notable change was in Goodman's estimator, L g. This statistic appeared to be least biased and clearly the most precise for all four sample pairs, provided K =. As in Table, L b and L g were equivalent for n x ==» a. Unfortunately, for K>, the bias in both L g and L b increased rapidly in absolute value, generally in a negative direction. In contrast, L rroi and L^ steadily removed bias with increasing K, and with reasonable precision. For log^' = and unequal covariate means, the differences between these two estimators were largest, with L^ always being less biased than, and of equivalent precision to, Lnj. Table 4 summarizes the Monte Carlo findings by giving the optimal statistics, in terms of both bias and precision. Where two or more estimators were equivalent or nearly so, that one involving the least computation or greatest precision is given first, with the best alternative in parentheses. Samples 1, 1 1,5 6, 1 6,5 Table 4. Optimal estimators for different sample sizes strata, K, and parameter values K > > > > (A) Equal covariate means (Hi = /A,, = 4-0) log (^') = log(f)= w ()* MTT W(B) B () (W) w () w () w () MTT (n^.n^), (B) Unequal covariate means 80, /X, = 0-0) log (if,') = 0-0 log (</.') = 1- B B W(B) B MTT B () B () B () w () MTT o () MTT o () * The estimator in parentheses, while not the least biased and most precise, provides a closely competing alternative. Estimators: B, Birch; G, Goodman;, Mantel-Haenszel; w, Woolf. MTT Some of the Monte Carlo findings were unexpected. The phenomenon of a maximum likelihood estimator being no less biased nor more precise than its initial approximation, even when the assumption of no second-order interaction was met, is difficult to explain satisfactorily, although the closeness of these two estimators has been observed by Gart in the analysis of real data sets (Gart, 1970, 1971). Another unexpected finding is the variable performance of Goodman's modification of the null hypothesis estimator, first proposed by Birch. As expected, for log >(i' = 0, both statistics have almost identical means and variances. For log tfi' 4=0, however, Birch's statistic does surprisingly well, provided the samples are approximately equal and second-order inter-

11 Combined estimators of the odds ratio 01 action is absent or negligible. Only for marked second-order interaction does L g clearly outperform L b for unequal samples and, even then, is optimal only for K =. Although generally one of the most precise, Woolf's estimator exhibited considerable instability in its mean value, with rapidly increasing bias as stratification was increased, a phenomenon probably due, as noted earlier, to the addition of to cell frequencies, and consistent with prior results on both simulated and real data (McKinlay, 1975; Gart, 1970). For this reason, L w should be avoided as an estimator, especially for K>. Mantel & Haenszel's statistic, formulated under the assumption of some second-order interaction being present (Mantel & Haenszel, 1959, p. 735) is consistent in reducing bias, with adequate precision in all situations considered. Certainly, for large samples and for K > this is the estimator of choice, whether or not the odds ratio is constant. Unfortunately, no usable estimate of variance is available for this simply computed statistic. However, from the Monte Carlo results presented here, it seems that, provided at least one of the samples is larger than 1 and log^'4=0, the variance estimator for L w of \[L k w k, from 3 provides an approximate estimate of variance for L ma. Finally, note that the results are presented on the log scale. When Monte Carlo values were computed for the odds ratio estimators themselves, the relationships among the statistics remained, but with increased positive and decreased negative biases, as well as more widely varying precision. This was to be expected from the discussion in and the values of Table 1. Thanks are due to Nathan Mantel, John Gart and editor and referees for helpful comments on earlier drafts, as well as to Donald Thomas for making available his program for the calculation of the conditional maximum likelihood estimator. REFERENCES BILXEWICZ, W. Z. (1965). The efficiency of matched samples: An empirical investigation. Biometrics 1, BIBCH, M. W. (1964). The detection of partial association. I: the x case. J. R. Statist. Soc. B 6, COCHBAN, W. G. (1954). Some methods for strengthening the common ^* test. Biometrics 10, COCHBAN, W. G. (1968). The effectiveness of subclassification in removing bias in observational studies. Biometrics 4, Cox, D. R. (1957). Note on grouping. J. Am. Statist. Assoc. 5, Cox, D. R. (1958). The regression analysis of binary sequences. J. R. Statist. Soc. B 0, FLEISS, J. L. (1973). Statistical Methods for Rates and Proportions. New York: Wiley. GABT, J. J. (196). On the combination of relative risks. Biometrics 18, GABT, J. J. (1970). Point and interval estimation of the common odds ratio in the combination of x tables with fixed marginals. Biometrika 57, GABT, J. J. (1971). The comparison of proportions: A review of significance tests, confidence intervals and adjustments for stratification. Rev. Inst. Int. Statist. 39, GABT, J. J. & ZWEIFEL, R. (1967). On the bias of various estimators of the logit and its variance with application to quantal bioassay. Biometrika 54, GOODMAN, L. A. (1969). On partitioning x 1 and detecting partial association in three-way contingency tables. J. R. Statist. Soc. B 31, GOODMAN, L. A. & KBTJSKAI, W. H. (1959). Measures of association for cross classifications II: further discussion and references. J. Am. Statist. Assoc. 54, HALDANE, J. B. S. (1956). The estimation and significance of the logarithm of a ratio of frequencies. Ann. Hum. Gen. 0, MANTEL, N. & HAENSZEL, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J. Nat. Cancer Inst., MCKINLAY, S. M. (1975). The effect of bias on estimators of relative risk for pair-matched and stratified samples. J. Am. Statist. Assoc. 70, RADHAKBISHNA, S. (1965). Combination of results from several x contingency tables. Biometrics 1,

12 0 SONJA M. MCKINLAY THOMAS, D. G. (1975). Exact and asymptotic methods for the combination of x tables. Computers and Biomed. Res. 8, Woou, B. (1956). On estimating the relation between blood group and disease. Ann. Hum. Gen. 19, YULB, G. U. (191). On methods of measuring association between attributes. J. R. Statist. Soc. 75, [Received June Revised September 1977]

Asymptotic efficiency of general noniterative estimators of common relative risk

Asymptotic efficiency of general noniterative estimators of common relative risk Biometrika (1981), 68, 2, pp. 526-30 525 Printed in Great Britain Asymptotic efficiency of general noniterative estimators of common relative risk BY MARKKU NTJRMINEN Department of Epidemiology and Biometry,

More information

Testing the homogeneity of variances in a two-way classification

Testing the homogeneity of variances in a two-way classification Biomelrika (1982), 69, 2, pp. 411-6 411 Printed in Ortal Britain Testing the homogeneity of variances in a two-way classification BY G. K. SHUKLA Department of Mathematics, Indian Institute of Technology,

More information

Two-way contingency tables for complex sampling schemes

Two-way contingency tables for complex sampling schemes Biomctrika (1976), 63, 2, p. 271-6 271 Printed in Oreat Britain Two-way contingency tables for complex sampling schemes BT J. J. SHUSTER Department of Statistics, University of Florida, Gainesville AND

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Robust Regression via Discriminant Analysis Author(s): A. C. Atkinson and D. R. Cox Source: Biometrika, Vol. 64, No. 1 (Apr., 1977), pp. 15-19 Published by: Oxford University Press on

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Describing Stratified Multiple Responses for Sparse Data

Describing Stratified Multiple Responses for Sparse Data Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at American Society for Quality A Note on the Graphical Analysis of Multidimensional Contingency Tables Author(s): D. R. Cox and Elizabeth Lauh Source: Technometrics, Vol. 9, No. 3 (Aug., 1967), pp. 481-488

More information

Marginal, crude and conditional odds ratios

Marginal, crude and conditional odds ratios Marginal, crude and conditional odds ratios Denitions and estimation Travis Loux Gradute student, UC Davis Department of Statistics March 31, 2010 Parameter Denitions When measuring the eect of a binary

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Part IV Statistics in Epidemiology

Part IV Statistics in Epidemiology Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic

An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic Kulinskaya and Dollinger TECHNICAL ADVANCE An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic Elena Kulinskaya 1* and Michael B Dollinger 2 * Correspondence: e.kulinskaya@uea.ac.uk

More information

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Nonparametric analysis of blocked ordered categories data: some examples revisited

Nonparametric analysis of blocked ordered categories data: some examples revisited University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Nonparametric analysis of blocked ordered categories data: some examples

More information

A simulation study for comparing testing statistics in response-adaptive randomization

A simulation study for comparing testing statistics in response-adaptive randomization RESEARCH ARTICLE Open Access A simulation study for comparing testing statistics in response-adaptive randomization Xuemin Gu 1, J Jack Lee 2* Abstract Background: Response-adaptive randomizations are

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018 , Non-, Precision, and Power Statistics 211 - Statistical Methods II Presented February 27, 2018 Dan Gillen Department of Statistics University of California, Irvine Discussion.1 Various definitions of

More information

Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions

Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions Biometrika (92), 9, 2, p. Printed in Great Britain Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions BY J. F. LAWLESS* University

More information

Incorporating Level of Effort Paradata in Nonresponse Adjustments. Paul Biemer RTI International University of North Carolina Chapel Hill

Incorporating Level of Effort Paradata in Nonresponse Adjustments. Paul Biemer RTI International University of North Carolina Chapel Hill Incorporating Level of Effort Paradata in Nonresponse Adjustments Paul Biemer RTI International University of North Carolina Chapel Hill Acknowledgements Patrick Chen, RTI International Kevin Wang, RTI

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Estimation and Confidence Intervals for Parameters of a Cumulative Damage Model

Estimation and Confidence Intervals for Parameters of a Cumulative Damage Model United States Department of Agriculture Forest Service Forest Products Laboratory Research Paper FPL-RP-484 Estimation and Confidence Intervals for Parameters of a Cumulative Damage Model Carol L. Link

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin

GROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin FITTING COX'S PROPORTIONAL HAZARDS MODEL USING GROUPED SURVIVAL DATA Ian W. McKeague and Mei-Jie Zhang Florida State University and Medical College of Wisconsin Cox's proportional hazard model is often

More information

Confidence Intervals for a Ratio of Binomial Proportions Based on Unbiased Estimators

Confidence Intervals for a Ratio of Binomial Proportions Based on Unbiased Estimators Proceedings of The 6th Sino-International Symposium Date published: October 3, 2009 on Probability, Statistics, and Quantitative Management pp. 2-5 Conference held on May 30, 2009 at Fo Guang Univ., Taiwan,

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Misclassification in Logistic Regression with Discrete Covariates

Misclassification in Logistic Regression with Discrete Covariates Biometrical Journal 45 (2003) 5, 541 553 Misclassification in Logistic Regression with Discrete Covariates Ori Davidov*, David Faraggi and Benjamin Reiser Department of Statistics, University of Haifa,

More information

Estimation of change in a rotation panel design

Estimation of change in a rotation panel design Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS028) p.4520 Estimation of change in a rotation panel design Andersson, Claes Statistics Sweden S-701 89 Örebro, Sweden

More information

Regression analysis based on stratified samples

Regression analysis based on stratified samples Biometrika (1986), 73, 3, pp. 605-14 Printed in Great Britain Regression analysis based on stratified samples BY CHARLES P. QUESENBERRY, JR AND NICHOLAS P. JEWELL Program in Biostatistics, University of

More information

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling International Journal of Statistics and Analysis. ISSN 2248-9959 Volume 7, Number 1 (2017), pp. 19-26 Research India Publications http://www.ripublication.com On Efficiency of Midzuno-Sen Strategy under

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Additive and multiplicative models for the joint effect of two risk factors

Additive and multiplicative models for the joint effect of two risk factors Biostatistics (2005), 6, 1,pp. 1 9 doi: 10.1093/biostatistics/kxh024 Additive and multiplicative models for the joint effect of two risk factors A. BERRINGTON DE GONZÁLEZ Cancer Research UK Epidemiology

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Stratified Randomized Experiments

Stratified Randomized Experiments Stratified Randomized Experiments Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Stratified Randomized Experiments Stat186/Gov2002 Fall 2018 1 / 13 Blocking

More information

Multi-Level Test of Independence for 2 X 2 Contingency Table using Cochran and Mantel Haenszel Statistics

Multi-Level Test of Independence for 2 X 2 Contingency Table using Cochran and Mantel Haenszel Statistics IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. Issue 8, August 015. ISSN 348 7968 Multi-Level Test of Independence for X Contingency Table using Cochran and Mantel

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

A bias improved estimator of the concordance correlation coefficient

A bias improved estimator of the concordance correlation coefficient The 22 nd Annual Meeting in Mathematics (AMM 217) Department of Mathematics, Faculty of Science Chiang Mai University, Chiang Mai, Thailand A bias improved estimator of the concordance correlation coefficient

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis JULY 2011 Afsaneh Yazdani Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

LOGISTIC FUNCTION A MINIMAX ESTIMATOR FOR THE. having certain desirable asymptotic properties. But realistically, what is of

LOGISTIC FUNCTION A MINIMAX ESTIMATOR FOR THE. having certain desirable asymptotic properties. But realistically, what is of A MINIMAX ESTIMATOR FOR THE LOGISTIC FUNCTION JOSEPH BERKSON' MAYO CLINIC AND J. L. HODGES, JR.2 UNIVERSITY OF CALIFORNIA, BERKELEY 1. Introduction One of us has discussed the use of the logistic function

More information

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO

More information

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed

More information

A bias-correction for Cramér s V and Tschuprow s T

A bias-correction for Cramér s V and Tschuprow s T A bias-correction for Cramér s V and Tschuprow s T Wicher Bergsma London School of Economics and Political Science Abstract Cramér s V and Tschuprow s T are closely related nominal variable association

More information

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann

Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Association Model, Page 1 Yu Xie, Institute for Social Research, 426 Thompson Street, University of Michigan, Ann Arbor, MI 48106. Email: yuxie@umich.edu. Tel: (734)936-0039. Fax: (734)998-7415. Association

More information

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Estimation and sample size calculations for correlated binary error rates of biometric identification devices Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, NSW 2308, Australia 2

School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, NSW 2308, Australia 2 International Scholarly Research Network ISRN Computational Mathematics Volume 22, Article ID 39683, 8 pages doi:.542/22/39683 Research Article A Computational Study Assessing Maximum Likelihood and Noniterative

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

YOU CAN BACK SUBSTITUTE TO ANY OF THE PREVIOUS EQUATIONS

YOU CAN BACK SUBSTITUTE TO ANY OF THE PREVIOUS EQUATIONS The two methods we will use to solve systems are substitution and elimination. Substitution was covered in the last lesson and elimination is covered in this lesson. Method of Elimination: 1. multiply

More information

An evaluation of homogeneity tests in meta-analyses in pain using simulations of individual patient data

An evaluation of homogeneity tests in meta-analyses in pain using simulations of individual patient data Pain 85 (2000) 415±424 www.elsevier.nl/locate/pain An evaluation of homogeneity tests in meta-analyses in pain using simulations of individual patient data David J. Gavaghan a, *, R. Andrew Moore b, Henry

More information

Guideline on adjustment for baseline covariates in clinical trials

Guideline on adjustment for baseline covariates in clinical trials 26 February 2015 EMA/CHMP/295050/2013 Committee for Medicinal Products for Human Use (CHMP) Guideline on adjustment for baseline covariates in clinical trials Draft Agreed by Biostatistics Working Party

More information

Inference on a Distribution Function from Ranked Set Samples

Inference on a Distribution Function from Ranked Set Samples Inference on a Distribution Function from Ranked Set Samples Lutz Dümbgen (Univ. of Bern) Ehsan Zamanzade (Univ. of Isfahan) October 17, 2013 Swiss Statistics Meeting, Basel I. The Setting In some situations,

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Efficient Robbins-Monro Procedure for Binary Data

Efficient Robbins-Monro Procedure for Binary Data Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

New Method to Estimate Missing Data by Using the Asymmetrical Winsorized Mean in a Time Series

New Method to Estimate Missing Data by Using the Asymmetrical Winsorized Mean in a Time Series Applied Mathematical Sciences, Vol. 3, 2009, no. 35, 1715-1726 New Method to Estimate Missing Data by Using the Asymmetrical Winsorized Mean in a Time Series Ahmad Mahir R. and A. M. H. Al-Khazaleh 1 School

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression Simulating Properties of the Likelihood Ratio est for a Unit Root in an Explosive Second Order Autoregression Bent Nielsen Nuffield College, University of Oxford J James Reade St Cross College, University

More information

Agreement Coefficients and Statistical Inference

Agreement Coefficients and Statistical Inference CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the

More information

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution Journal of Computational and Applied Mathematics 216 (2008) 545 553 www.elsevier.com/locate/cam Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

More information

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest. Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

More information

I INTRODUCTION. Magee, 1969; Khan, 1974; and Magee, 1975), two functional forms have principally been used:

I INTRODUCTION. Magee, 1969; Khan, 1974; and Magee, 1975), two functional forms have principally been used: Economic and Social Review, Vol 10, No. 2, January, 1979, pp. 147 156 The Irish Aggregate Import Demand Equation: the 'Optimal' Functional Form T. BOYLAN* M. CUDDY I. O MUIRCHEARTAIGH University College,

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions J. L. Wadsworth Department of Mathematics and Statistics, Fylde College, Lancaster

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Reference: Chapter 13 of Montgomery (8e)

Reference: Chapter 13 of Montgomery (8e) Reference: Chapter 1 of Montgomery (8e) Maghsoodloo 89 Factorial Experiments with Random Factors So far emphasis has been placed on factorial experiments where all factors are at a, b, c,... fixed levels

More information

BP -HOMOLOGY AND AN IMPLICATION FOR SYMMETRIC POLYNOMIALS. 1. Introduction and results

BP -HOMOLOGY AND AN IMPLICATION FOR SYMMETRIC POLYNOMIALS. 1. Introduction and results BP -HOMOLOGY AND AN IMPLICATION FOR SYMMETRIC POLYNOMIALS DONALD M. DAVIS Abstract. We determine the BP -module structure, mod higher filtration, of the main part of the BP -homology of elementary 2- groups.

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

Dirichlet-multinomial Model with Varying Response Rates over Time

Dirichlet-multinomial Model with Varying Response Rates over Time Journal of Data Science 5(2007), 413-423 Dirichlet-multinomial Model with Varying Response Rates over Time Jeffrey R. Wilson and Grace S. C. Chen Arizona State University Abstract: It is believed that

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

1. INTRODUCTION 2. RANDOMISATION AND INDEPENDENCE OF INDIVIDUAL TREATMENTS Statistical analysis EUROPEAN PHARMACOPOEIA 5.

1. INTRODUCTION 2. RANDOMISATION AND INDEPENDENCE OF INDIVIDUAL TREATMENTS Statistical analysis EUROPEAN PHARMACOPOEIA 5. EUROPEAN PHARMACOPOEIA 5.0 5.3. Statistical analysis 1. INTRODUCTION 01/2005:50300 This chapter provides guidance for the design of bioassays prescribed in the European Pharmacopoeia (Ph. Eur.) and for

More information

5.3. STATISTICAL ANALYSIS OF RESULTS OF BIOLOGICAL ASSAYS AND TESTS

5.3. STATISTICAL ANALYSIS OF RESULTS OF BIOLOGICAL ASSAYS AND TESTS EUROPEAN PHARMACOPOEIA 6.0 5.3. Statistical analysis 01/2008:50300 5.3. STATISTICAL ANALYSIS OF RESULTS OF BIOLOGICAL ASSAYS AND TESTS 1. INTRODUCTION This chapter provides guidance for the design of bioassays

More information