Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement

Size: px

Start display at page:

Download "Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement"

Edgar Williams
5 years ago
Views:

1 Biostatistics (2005), 6, 1,pp doi: /biostatistics/kxh027 Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement MICHAEL P. FAY National Institute of Allergy and Infectious Diseases, 6700B Rockledge Dr. MSC 7609, Bethesda, MD , USA SUMMARY Agreement coefficients quantify how well a set of instruments agree in measuring some response on a population of interest. Many standard agreement coefficients (e.g. kappa for nominal, weighted kappa for ordinal, and the concordance correlation coefficient (CCC) for continuous responses) may indicate increasing agreement as the marginal distributions of the two instruments become more different even as the true cost of disagreement stays the same or increases. This problem has been described for the kappa coefficients; here we describe it for the CCC. We propose a solution for all types of responses in the form of random marginal agreement coefficients (RMACs), which use a different adjustment for chance than the standard agreement coefficients. Standard agreement coefficients model chance agreement using expected agreement between two independent random variables each distributed according to the marginal distribution of one of the instruments. RMACs adjust for chance by modeling two independent readings both from the mixture distribution that averages the two marginal distributions. In other words, both independent readings represent first a random choice of instrument, then a random draw from the marginal distribution of the chosen instrument. The advantage of the resulting RMAC is that differences between the two marginal distributions will not induce greater apparent agreement. As with the standard agreement coefficients, the RMACs do not require any assumptions about the bivariate distribution of the random variables associated with the two instruments. We describe the RMAC for nominal, ordinal and continuous data, and show through the delta method how to approximate the variances of some important special cases. Keywords: Concordance correlation coefficient; Kappa; Random marginal agreement coefficient; Reliability; Weighted kappa. 1. INTRODUCTION When two instruments are believed to measure the same values, it is often desired to have a single coefficient that measures how well the two instruments agree. We consider coefficients that apply to categorical responses (e.g. two health professionals both classifying patients into k possibly ordered categories of disease) or to more continuous-like responses (e.g. two assays both measuring concentration of a specific antibody in blood samples). Let X and Y be the random variables associated with the responses measured on some population of interest by the two instruments. Then X and Y are either scalar valued (corresponding to continuous responses or discrete responses with known scores), or vector valued with each element zero except Biostatistics Vol. 6 No. 1 c Oxford University Press 2005; all rights reserved.

2 172 M. P. FAY one (corresponding to categorical responses). Let F XY be the joint distribution of X and Y.Wewish to summarize the distribution F XY with a single scalar coefficient which represents how well X and Y agree. We denote these population agreement coefficients by A and their sample values by Â. In this paper we consider only nonparametric agreement coefficients, where A requires no assumptions about F XY. By defining the agreement problem this way, we exclude many useful parametric models used for measuring agreement which require some assumptions about F XY.For example, log linear models can describe agreement with nominal data (Tanner and Young, 1985) and ordinal data (Agresti, 1988). For continuous data, the intraclass correlation is defined under an additive model which induces a structure on F XY (see e.g. Shrout and Fleiss, 1979). Carrasco and Jover (2003) show that under the usual additive model assumptions, the intraclass correlation is equivalent to the concordance correlation coefficient (CCC) of Lin (1989). For binary responses, the intraclass kappa (Bloch and Kraemer, 1989) assumes equivalent marginal distributions. Although we show later that the sample intraclass kappa (equivalent to Scott s (1955) estimator) is a good estimator of the RMAC applied to nominal data, an important difference between the population intraclass kappa and the associated RMAC is that the population RMAC makes no assumptions about the bivariate distribution, F XY. Agreement coefficients which do not require assumptions about F XY are the CCC for continuous data (Lin, 1989), and Cohen s kappa or weighted kappa for nominal data or ordinal data (see e.g. Fleiss et al., 2003). We call these standard agreement coefficients (e.g. kappa, CCC), fixed marginal agreement coefficients (FMACs) in order to contrast them with the random marginal agreement coefficients (RMACs) we propose. In Section 2 we review how the FMACs adjust for chance, and we propose a different adjustment producing the RMACs. The terms fixed and random apply to how the marginal distributions are used in the chance calculation, and this terminology should not be confused with Lin et al. (2002) who talk about whether one of the instruments has values that may be fixed or random. An important property of the RMACs is that increasing differences in the marginal distributions cannot increase the adjustment for chance and consequently increase the agreement coefficient as is the case with the FMAC. We define both the FMACs and the RMACs using general cost functions similar to King and Chinchilli (2001) who generalized only the FMACs. We spend the bulk of this paper (Sections 2 4) comparing population agreement coefficients, discussing the usefulness of different ways of summarizing F XY into a single number. In Section 3 we discuss the RMAC applied to categorical data. The RMAC counterpart to weighted kappa is also discussed in Section 3, and the RMAC counterpart to the concordance correlation coefficient is discussed in Section 4. Also in Section 4 we give an interpretation of a transformation of the RMAC with squared difference cost as the proportion of variance of the response from an randomly chosen instrument attributable to instrument disagreement. We offer estimators and confidence intervals of these coefficients in Section 5 and end with a discussion. 2. FIXED MARGINAL VERSUS RANDOM MARGINAL AGREEMENT COEFFICIENTS Let c(x, y) be the cost of disagreement when X = x and Y = y, which equals zero when x = y and is non-negative otherwise, and c(x, y) = c(y, x) for all x, y. Agreement coefficients for categorical data can equivalently be represented using positive weights for agreement (see Section 3). Let the expected cost given F XY be called the true cost and be written E FXY (c(x, Y )). Togive interpretability to the true cost, we first scale it by some chance cost, then transform the scaled value to equal 1 at perfect agreement and 0 when true cost equals chance cost. Write the chance cost in general form as E FU E FV {c(u, V )}, where U and V are independent random variables defined later. Then the agreement coefficients discussed in this paper are all in the form A = 1 E F XY {c(x, Y )} E FU E FV {c(u, V )}. (2.1)

3 Random marginal agreement coefficients 173 In FMACs (e.g. kappa, CCC), we model the chance cost by fixing the distribution for the first random variable to be the marginal distribution of the first instrument, and similarly for the second random variable, giving A F (c) = 1 E F XY {c(x, Y )} E FX E FY {c(x, Y )}, where F X and F Y are the marginal distributions for X and Y respectively. The problem with FMACs is that increasing differences between F X and F Y while holding the true cost constant can cause larger values for chance cost, which implies better agreement for A F (c). This problem has been widely studied for nominal data (see e.g. Byrt et al., 1993), but not studied for continuous data. Examples are presented in Sections 3 and 4. Our solution to the above problem is the RMACs, denoted A R (c). The RMACs let U and V of equation (2.1) be independent responses from the same distribution, F Z = 0.5F X + 0.5F Y,i.e. E FXY {c(x, Y )} A R (c) = 1 E FZ1 E FZ2 {c(z 1, Z 2 )}. For the RMAC, we model disagreement by chance by first randomly choosing an instrument and then randomly drawing from the marginal of that instrument. Thus, differences between the marginal distributions cannot affect RMACs. For practical applications, we can apply Zwick s (1988) recommendation for nominal data to all types of responses; when exploring agreement, first test for differences between the marginal distributions F X and F Y, then if there are no significant differences use the sample RMAC (for nominal responses this is Scott s (1955) estimator). Thus, even if there was low power to detect marginal differences, the subsequent RMAC can detect the effect of the marginal differences on the true cost more strongly than the FMAC, since larger marginal differences do not induce greater chance cost adjustments. 3. CATEGORICAL RESPONSES FOR k k TABLES In this section X and Y both represent categorical responses with k possible responses. Let e j be a k 1 vector of zeros except with a 1 in the jth row, and the sample space for both X and Y is {e 1,...,e k }. Let π ab = Pr[X = e a, Y = e b ], and let a dot over an index denote summation over that index (e.g. π a = k j=1 π aj ). In this notation, k kj=1 c ij π ij A F (c) = 1 k kj=1, c ij π i π j where c ij = c(e i, e j ) and A R (c) = 1 k kj=1 c ij π ij k kj=1 c ij (0.5π i + 0.5π i ) ( 0.5π j + 0.5π j ). We can write both A F (c) and A R (c) in terms of positive weights for agreement. Since scaling the cost by a constant does not change the value of either A F (c) or A R (c),weuse a scaled version of the c ij,saycij, such that max i, j cij = 1. Then w ij 1 cij equals 1 for perfect agreement and 0 w ij 1 for all i = j, and A F (c) = 1 k kj=1 (1 w ij )π ij k kj=1 (1 w ij )π i π j = 1 1 k kj=1 w ij π ij 1 k kj=1 w ij π i π j = o e 1 e,

4 174 M. P. FAY Table 1. Multiple Sclerosis Diagnoses (Westlund and Kurkland, 1953) 1a: Original data 1b: Modified data Neurologist 2 Neurologist 2 Neurologist Total Neurologist Total Total Total = certain MS, 2 = probable MS, 3 = possible MS (50:50 odds), and 4 = doubtful, unlikely, or definitely not MS. where o = k kj=1 w ij π ij and e = k kj=1 w ij π i π j. This is the standard form for weighted kappa. In this kappa form A R (c) is A R (c) = o z 1 z (3.1) where z = k kj=1 w ij (0.5π i + 0.5π i ) ( 0.5π j + 0.5π j ). Consider three common cost functions for categorical data, nominal cost (n), squared difference cost (d), and absolute value of the difference cost (a). The usual kappa is A F (n), the FMAC using the nominal cost function, where n(x, y) = 0ifx = y and 1 otherwise. In terms of weights the nominal cost is c ij = 0 (i.e. w ij = 1) if i = j and c ij = 1 (i.e. w ij = 0) if i = j. Then o represents the probability of perfect agreement, and e represents the probability of perfect agreement by chance under the fixed marginal model. For ordinal responses, the value of the most common cost functions when x = e i and y = e j are either d(x, y) = c ij = (i j) 2 (i.e. w ij = 1 (i j) 2 /(k 1) 2 )ora(x, y) = c ij = i j (i.e. w ij = 1 i j /(k 1)) (see Fleiss et al., 2003). The associated FMACs are denoted A F (d) and A F (a), respectively. Another way to represent ordered scores is to let the sample space for X and Y consist of k ordered (scalar) scores, s 1 < s 2 < < s k. Then letting s i = i we get A F (d) or A F (a) by now defining d(x, y) = (x y) 2 and a(x, y) = x y. The RMAC notation is analogous. In Table 1a we present data previously used in the agreement literature, the independent classification of two neurologists of 149 patients into four categories, 1=certain multiple sclerosis (MS), 2=probable MS, 3=possible MS (50:50 odds), and 4=doubtful, unlikely, or definitely not MS. Suppose we define the π ab values by the proportions from Table 1a, then A F (n) = and A R (n) = Now modify the data to get Table 1b by supposing that the 10 patients that were rated 3 by Neurologist 1 and 1 by Neurologist 2, were instead rated 1 by Neurologist 1 and 3 by Neurologist 2. Again defining the π ab values by the proportions, the values of the agreement coefficients are A F (n) = and A R (n) = for the modified table. The FMAC shows better agreement for Table 1a over Table 1b, despite the fact that the modified Table 1b has closer matching marginals and identical diagonal values (exact matches) to Table 1a. In contrast, the RMAC shows identical values for both tables. A similar phenomenon occurs when using the ordinal cost functions, d and a. The FMACs show better agreement for Table 1a despite the fact that Table 1b has the same diagonal values and more closely matched marginals (Table 1a, A F (d) = Table 1b, A F (d) = 0.503; Table 1a, A F (a) = Table 1b, A F (a) = 0.355). In contrast, the RMACs show identical agreement between the two tables (both tables, A R (d) = 0.497; both tables, A R (a) = 0.348).

5 Random marginal agreement coefficients CONTINUOUS RESPONSES 4.1 Comparison of RMAC to concordance correlation coefficient Because of historical precedent, simplifications, and some nice properties, we focus on the squared difference cost function (where c(x, y) is d(x, y) = (x y) 2 ) for continuous responses. Other cost functions (e.g. c(x, y) = a(x, y) = x y ) may be used, but are not discussed in this section. For continuous responses A F (d) gives the CCC (Lin, 1989), A F (d) = 1 σ x 2 + σ y 2 + (µ x µ y ) 2 2ρσ x σ y 2ρσ x σ y σx 2 + σ y 2 + (µ x µ y ) 2 = σx 2 + σ y 2 + (µ x µ y ) 2, where µ x (µ y ) and σ 2 x (σ 2 y )are the means and variances associated with F X (F Y ), and ρ = Corr(X, Y ). Following Lin (1989) we can write this in terms of three parameters, A F (d) = 2ρ v + 1/v + u 2, where v = σ x /σ y and u = (µ x µ y )/ σ x σ y. To calculate A R (d), first note that E FZ1 E FZ1 (Z 1 Z 2 ) 2 = 2Var(Z), where as before Z 1 and Z 2 are independent and F Z = 0.5F X + 0.5F Y. This gives [E Z (Z 2 ) {E Z (Z)} 2] 2Var(Z) = 2 = 2 and A R (d) in terms of u, v and ρ is [ 1 2 E X (X 2 ) E Y (Y 2 ) = σ 2 x + σ 2 y (µ x µ y ) 2 A R (d) = 2ρ 1 2 u2 ( ) ] µx + µ 2 y 2 v + 1/v + 1. (4.1) 2u2 When u = 0then A F (d) = A R (d).tocompare the two agreement measures more generally we plot each agreement measure versus u fixing v = 1 with lines representing different values of ρ. InFigure 1a we see that the CCC (A F (d)) approaches 0 as u gets large, while Figure 1b shows that A R (d) approaches 1 inthe same situations. With fixed negative correlation and increasing standardized mean difference, the CCC increases (implying better agreement), while A R (d) decreases. To show the problem consider two multivariate normal distributions both with σx 2 = σ y 2 = 1, and ρ = 0.1. In the first distribution, the means are equal, µ x = µ y = 0, while in the second the means differ, µ x = 2and µ y = 2. Clearly the second distribution represents worse agreement between X and Y,but only A R (d) shows this (first distribution, A F (d) = A R (d) = 0.1; second distribution, A F (d) = 0.01, A R (d) = 0.82). 4.2 Interpretation as partition of variance For the RMAC with continuous responses we can interpret {1 A R (d)}/2 asthe proportion of variance of an arbitrary instrument s response attributable to disagreement between the instruments. To see this, let R be a Bernoulli random variable with parameter 0.5. Then Z = RX + (1 R)Y represents a random

6 176 M. P. FAY (a) Concordance Correlation Coefficient, A F (d) A F (d) A R (d) ρ=0.5 ρ=0 ρ= 0.5 ρ= u ρ=1 (b) RMAC with Squared Difference Cost, A R (d) ρ=0.5 ρ=0 ρ= 0.5 ρ= u Fig. 1. choice between X and Y, and the distribution of Z is F Z as previously defined. The variance of Z can be partitioned into Var(Z) = Var (U) E ( (X Y ) 2), where here U = 0.5X + 0.5Y. Thus, 1 A R (d) 2 = ρ=1 1 4 E { F XY (X Y ) 2 } Var(Z) can be interpreted as a proportion of the variance of Z attributable to disagreement between instruments. The value of {1 2A R (d)}/2 isclose to zero (i.e. A R (d) is close to one) when the expected squared difference between the responses from the two instruments is small compared to the variance of the average response of the two instruments; and the value is close to one (i.e. A R (d) is close to minus one) when the expected squared difference is much larger than that variance of the average. 5. ESTIMATION AND INFERENCES 5.1 General case We can use the bootstrap to derive simple estimators (see e.g. Efron and Tibshirani, 1993). Let the data be paired responses, (x 1, y 1 ),...,(x n, y n ). The ideal bootstrap estimators are Â F (c) = 1 n 1 n c(x i, y i ) n 2 n nj=1 c(x i, y j )

7 Random marginal agreement coefficients 177 for the FMAC, and Â R (c) = 1 n 1 n c(x i, y i ) (2n) 2 2n 2n j=1 c(z i, z j ) for the RMAC, where z =[x, y] =[x 1,...,x n, y 1,...,y n ]. For categorical data these estimators are equivalent to replacing the π ij values in the expression for A F (c) or A R (c) with the sample proportions. Similarly we can write the bootstrap for continuous and ordinal data by replacing F XY, F X, and F Y with their respective empirical distributions. For scalar data, we can write A F (d) or A R (d) in terms of E(X), E(Y ),Var(X),Var(Y ), and Corr(X, Y ) (see Section 4), so we simply replace those values with their usual bootstrap estimators. Alternatively, we could use unbiased sample variance and covariance estimators. For inferences on A R (c) or A F (c), wecan apply the bias corrected and accelerated (BC a ) bootstrap confidence intervals (see e.g. Efron and Tibshirani, 1993). 5.2 Special case: categorical responses An asymptotic variance expression for A F (c) has been derived (see e.g. Fleiss et al., 2003); here we give an estimator for A R (c) using the kappa form weights. Fisher s z-transformation gives β = tanh 1 [A R (c)] = 1 ( ) log AR (c). 1 A R (c) In Section 1 of the supplementary material ( we derive the delta method variance estimate for ˆβ, where ˆσ 2ˆβ = a=1 b=1 ( ˆπ ab ˆD ab 2 a=1 ) 2 ˆπ ab ˆD ab b=1 ˆD ab = 2w ab ( w a + w a + w b + w b ) 4(1 + ˆ o 2 ˆ z ) w a = w ia ˆπ i and w a = w ab 2(1 ˆ o ), w aj ˆπ j, and any value topped with a hat denotes replacing all π ij with ˆπ ij in its definition, where ˆπ ij is n 1 times the number ( of pairs with x ) = e i and y = e j. The 100(1 α) percent confidence limits for Â R (c) are tanh ˆβ ± 1 (1 α/2) ˆσ ˆβ where 1 (p) is the pth quantile of the standard normal distribution. We performed simulations on five distributions for F XY,three with k = 2, one with k = 4 and one with k = 5. We used the nominal cost function in every case, and when k = 4or5we additionally used the absolute difference and squared difference cost functions. For each distribution/cost function combination, we simulated with n = 20 and n = 50, and with c(x, y) = d(x, y) and k = 4, or 5 we additionally did n = 200. There were a total of 20 simulations. For each simulation we did 1000 replications, and for the BC a we used 1000 bootstrap resamples. j=1

8 178 M. P. FAY In every case the estimators of A R (c) appear slightly biased downward, with all simulated means within 0.05 of the true value. Both the delta method intervals and the BC a intervals give reasonably adequate coverage, with the BC a intervals preferred when k = 4or5.Fork = 2 the simulated 95% coverage for the delta method was 94 95% and for the BC a method was 95 96% except one case of 89.5%. For the cases with k = 4 and 5 the coverage was 94% or greater in 10/14 cases for the BC a method but only 4/14 for the delta method. Note that even in the cases with k 4, that have quite a few cells with very low probability of response, the coverage for both methods was generally over 90%. Details are given in Section 2 of the supplementary material ( 5.3 Special case: continuous using squared difference cost To derive confidence intervals for A R (d) we follow a similar strategy to Lin (1989). Fisher s z- transformation gives ξ = 1 ( ) ( log AR (d) = 12 1 A R (d) log σx 2 + σ y 2 + 2σ ) xy σx 2 + σ y 2 2σ xy + (µ x µ y ) 2. To estimate ξ we use unbiased estimators of the numerator and denominator of the ratio inside the logarithm, to obtain ˆξ = 1 2 log S x 2 + S2 y + 2S xy ( ) ( ) ( ) n 1 n Sx 2 + n 1 n Sy 2 2 n 1 n S xy + ( X Ȳ ) 2 where X and Ȳ are means, and S 2 x = (n 1) 1 n (X i X) 2, S 2 y = (n 1) 1 n (Y i Ȳ ) 2, and S xy = (n 1) 1 n (X i X)(Y i Ȳ ). Then using the delta method we show in Section 3 of the supplementary material ( that under the assumption of normal responses an asymptotic estimator of the variance is ˆσ 2ˆξ = ( X Ȳ ) 4 (Sx 2 + S2 y + 2S xy) + 2( X Ȳ ) 2 (Sx 4 + S4 y + 6S2 x S2 y 8S xy) + 8(Sx 2 + S2 y 2S xy)(sx 2S2 y S2 xy ) ) 2 ( ). 2n (Sx 2 + S2 y 2S xy + ( X Ȳ ) 2 Sx 2 + S2 y + 2S xy Through simulations (and similar to the results of Lin, 1989) we show that we get better coverage if we use σ 2ˆξ = n ˆσ /(n 2) to calculate confidence intervals. The 100(1 α) percent confidence limits for 2ˆξ ) Â R (c) are tanh (ˆξ ± 1 (1 α/2) σˆξ. We performed 18 simulations on different normal distributions with replications each using 1000 bootstrap resamples. The simulated bias estimates were all less than The simulated 95% coverage for the delta method intervals using σˆξ were all above 92% with 15/18 between 94 95%. The BC a simulated coverage was worse with coverage mostly around 91 93%. The coverage for the BC a intervals may improve with more bootstrap replications. Details are given in Section 4 of the supplementary material ( 6. DISCUSSION We have proposed that RMAC should be used in order to stop differences between marginal distributions from inducing greater agreement. The RMAC do not address other common criticisms of

9 Random marginal agreement coefficients 179 agreement coefficients. Firstly, as with FMAC, when comparing two agreement coefficients, it is necessary to realize the dependence of the RMAC on the form of the average marginal distribution F Z (Byrt et al., 1993). Secondly (and relatedly), as with FMAC, the RMAC depends on the heterogeneity of the population; for example, in the continuous case if the range of responses is large, then it is much easier to obtain higher agreement coefficients than if the range of responses is small (Atkinson and Nevill, 1997; Lin and Chinchilli, 1997). For binary data, one can see this effect when the data are nearly homogeneous (i.e. if the probability of responding in one category is close to one), then both the FMAC and RMAC will have large chance agreement (low chance cost) and generally lower agreement coefficients. Thirdly, in the nominal case with more than two categories of response, both the FMAC and the RMAC may be misleading. One may have high agreement yet all the categories but one may be indistinguishable from each other (Kraemer et al., 2002). Finally, since it is only one measure, the RMAC cannot describe all aspects of the bivariate distribution F XY that are of interest in agreement studies (for other measures see Lin et al., 2002). Although the sample RMAC for nominal data is equivalent to Scott s (1955) estimator, we have made no assumptions on the equality of the marginal distributions. This apparent assumption of Scott may have led to a preference for Cohen s kappa over Scott s estimator. For example, Fleiss (1975) says the kappa is preferred to Scott s estimator because it does not make an unwarranted assumption about the marginal proportions. In fact, in our presentation we have emphasized that neither the FMAC (estimated by kappa for nominal data) nor the RMAC (estimated by Scott s estimator for nominal data) make any assumptions about the marginal distributions. In this paper we have argued for the use of RMAC over the use of FMAC, but there may be some cases when the FMAC is preferred. Consider two raters classifying observations into sets with no clear boundaries, so that there is no intrinsic meaning to the classification. For example, suppose raters were classifying people as being in poor health, fair health, or good health. Because the categories are fuzzy, there is no correct distribution for the study population, and the marginal for each rater just denotes that rater s preferences. The FMAC could be interpreted as measuring agreement given the preferences (i.e. marginal distributions) of the raters. Then if more disparate marginals induce greater agreement in the FMAC, we accept that interpretation because the induced agreement should be greater since it was achieved despite the larger difference in marginals. The kappa coefficient and the CCC have been generalized and extended to handle multiple raters, stratified data, and testing of agreement coefficients (Banerjee et al., 1999; King and Chinchilli, 2001). The RMAC should be able to be extended in similar ways, and that work is left to future research. ACKNOWLEDGMENTS I thank Dean Follmann, Ji Hyun Le, and Martha Nason for comments and discussions on drafts of this paper. REFERENCES AGRESTI, A.(1988). A model for agreement between ratings on an ordinal scale. Biometrics 44, ATKINSON, G. AND NEVILL, A.(1997). Comment on the use of concordance correlation to assess the agreement between two variables. Biometrics 53, BANERJEE, M., CAPOZZOLI, M., MCSWEENEY, L. AND SINHA, D.(1999). Beyond kappa: a review of interrater agreement measures. Canadian Journal of Statistics 27, BLOCH, D. A. AND KRAEMER, H. C.(1989). 2 2 kappa coefficients: measures of agreement or association. Biometrics 45,

10 180 M. P. FAY BYRT, T., BISHOP, J. AND CARLIN, J. B.(1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology 46, CARRASCO, J. L. AND JOVER, L.(2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics 59, EFRON, B.AND TIBSHIRANI, R.J.(1993). An Introduction to the Bootstrap. New York: Chapman & Hall. FLEISS, J.L.(1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31, FLEISS, J. L., LEVIN, B. AND PAIK, M. C.(2003). Statistical Methods for Rates and Proportions, 3rd edn. New York: Wiley. KING, T.S.AND CHINCHILLI, V.M.(2001). A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 20, KRAEMER, H. C., PERIYAKOIL, V. S. AND NODA, A.(2002). Kappa coefficients in medical research. Statistics in Medicine 21, LIN, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, (Correction: 2000, pp ). LIN, L. I. AND CHINCHILLI, V.(1997). Rejoinder to the letter to the editor from Atkinson and Nevill. Biometrics 53, LIN, L., HEDAYAT, A. S., SINHA, B. AND YANG, M.(2002). Statistical methods in assessing agreement: models, issues, and tools. Journal of the American Statistical Association 97, SCOTT, W. A.(1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly 19, SHROUT, P. E. AND FLEISS, J. L.(1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 86, TANNER, M. A. AND YOUNG, M. A.(1985). Modeling agreement among raters. Journal of the American Statistical Association 80, WESTLUND, K. B. AND KURKLAND, L. T.(1953). Studies in multiple sclerosis in Winnipeg, Manitoba and New Orleans, Louisiana. American Journal of Hygiene 57, ZWICK, R.(1988). Another look at interrater agreement. Psychological Bulletin 103, [Received 13 July 2004; revised 29 September 2004; accepted for publication 1 October 2004]

A UNIFIED APPROACH FOR ASSESSING AGREEMENT FOR CONTINUOUS AND CATEGORICAL DATA

Journal of Biopharmaceutical Statistics, 17: 69 65, 007 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/150-5711 online DOI: 10.1080/10543400701376498 A UNIFIED APPROACH FOR ASSESSING AGREEMENT