A General Approach to Evaluating Agreement between Two Observers or Methods of Measurement

Size: px

Start display at page:

Download "A General Approach to Evaluating Agreement between Two Observers or Methods of Measurement"

Charles Shepherd
5 years ago
Views:

1 A eneral Approach to Evaluatng Agreement between wo Observers or Methods of Measurement Mchael Haber Department of Bostatstcs, ollns School of Publc Health Emory Unversty, Atlanta, A and Human X. Barnhart Department of Bostatstcs and Bonformatcs, Duke Clncal esearch Insttute, Duke Unversty, Durham, C Correspondence author: Dr. Mchael J. Haber, Department of Bostatstcs, ollns School of Publc Health, Emory Unversty, Atlanta A 303, U.S.A. el: (404) e-mal: mhaber@sph.emory.edu evsed 3//06 1

2 Abstract We present a general approach to the defnton and estmaton of coeffcents for evaluatng agreement between two fxed methods of measurements or human observers. he measured varable s assumed to be contnuous, but no other dstrbutonal assumptons are made. We ntroduce the term dsagreement functon for the functon of the observatons that s used to quantfy the extent of dsagreement between the two measurements made on the same subject. he proposed agreement coeffcents compare the dsagreement between measurements made by dfferent methods on the same subject to the correspondng dsagreement between measurements made by the same method. We propose agreement coeffcents for two practcal stuatons nvolvng two methods that have a measurement error: (a) comparson of a new method to a gold standard (or a reference method), and (b) comparson of two methods where nether method s consdered a gold standard. We consder three dsagreement functons based on the dfferences between two measurements: (a) the mean squared dfference (MSD), (b) the mean absolute dfference (MAD), and (c) the complement of the coverage probablty (CP), whch s the probablty that the absolute dfference does not exceed a predetermned threshold. We then derve nonparametrc estmates for the varous agreement coeffcents. Our approach s llustrated usng data from a study comparng systolc blood pressure measurements by a human observer and an automatc montor. he performance of the new estmates s assessed va stochastc smulatons..

3 1. Introducton Frequently one s nterested n comparng two methods of measurement that were appled to each of study subjects. In ths paper, the term method may correspond to a measurements devce or to a human observer. Denotng the correspondng measurements X and Y, we wll say that the two methods are n agreement f they produce the same value on each subject. In realty, even under perfect agreement the values of X and Y on the same subject wll usually dffer because of measurement errors and other factors. herefore, t s mportant to defne and estmate coeffcents that quantfy the extent to whch the two methods agree wth each other. We consder two practcal scenaros for the comparson of two methods that have a measurement error. In the frst scenaro, one s nterested n comparng the methods wthout consderng ether of them as a reference (or a gold standard ). hs may be of nterest, for example, when due to logstc consderatons one plans to use method X on some of the subjects and method Y on the remanng subjects. In ths case t s mportant that the two methods produce smlar values when appled to the same subject so that they can be used nterchangeably. In the second scenaro, method X s a reference method that has been used n the past and was found to produce relable measurements, and method Y s a new method that may be less expensve or less nvasve. In ths case one may consder replacng X by Y, and must make sure that the two methods produce smlar measurements on the same subject. We wll refer to these two scenaros as assessng agreement wth or wthout a reference method. Our objectve s to defne coeffcents of agreement for each of the two scenaros such that f the coeffcent s close to one (or exceeds one) we conclude that the methods are n good agreement, whle a value close to zero ndcates poor agreement. Over the years, several coeffcents of agreement have been proposed 3

4 and used, ncludng dfferent versons of the ntraclass correlaton coeffcent (ICC) and the concordance correlaton coeffcent (CCC). A comparatve summary of these coeffcents s presented n a recent paper by Haber and Barnhart (006) 1. In the present paper we propose a general approach that can be used n defnng and estmatng coeffcents of agreement correspondng to dfferent measures of dsagreement. he new coeffcents generalze the ψ coeffcent presented n our earler paper. In order to defne a coeffcent of agreement, we frst have to decde how we quantfy the agreement (or lack thereof) between the two methods. For ths, we defne the concept of a dsagreement functon ( X, Y ). hs functon has to satsfy: a. ( X, Y ) 0 b. ( X, Y ) = ( Y, X ) c. ( X, Y ) ncreases as the dsagreement between X and Y (accordng to a specfc crteron) ncreases. Once we have decded whch dsagreement functon to use, t s necessary to standardze t n order to obtan a coeffcent so that values close to zero ndcate poor agreement and values close to (or above) one ndcate good agreement. In other words, we need to compare the observed value of ( X, Y ) to a reference value that represents ether the best or the worst case scenaro. he ICC s and the CCC consder the worst case scenaro as ndependence, or agreement by chance. herefore these coeffcents compare the observed value of a dsagreement functon to the value expected under ndependence between the two methods, 3 In our opnon 1, ths approach s unjustfed because (a) agreement and correlaton are dfferent concepts, and (b) two observatons made on the same subject are always expected be postvely correlated because both are postvely correlated wth the true value of the measured varable on ths subject. 4

5 Snce we could not fnd an adequate defnton for the worst case scenaro we decded to move to the opposte sde of the agreement scale and defne the best case scenaro as the case where the dsagreement between observatons made by dfferent methods s smlar to the dsagreement between observatons made by the same method. In other words, replacng or nterchangng one method by the other does not substantally ncrease the dsagreement between measurements made on the same subject. herefore, the coeffcents of agreement we propose compare the dsagreement between measurements made by dfferent methods to the dsagreement between replcated measurements made by the same method. We denote by ( X, X ') the dsagreement between two replcated measurements made by the method X, and defne ( Y, Y ') n an analogous way for method Y. Usng the above-mentoned prncple, we defne the coeffcent of agreement n the case that none of the methods s consdered as a reference as: [ ( X, X ') + ( Y, Y ')]/ ψ =. (1) ( X, Y ) he numerator s the average dsagreement between two replcated measurements made by the same method, and the denomnator s the dsagreement between X and Y. he coeffcent of agreement for the case where method X s consdered a reference method s defned as: ( X, X ') ψ =. () ( X, Y ) Here we compare the dsagreement between measurements made by the two methods wth the dsagreement between two replcated measurements made by the reference method. ote that both ψ and ψ are nonnegatve but they may attan values greater than one, ndcatng that the between-methods dsagreement s lower than the wthn-methods dsagreement. he comparson of ( X, Y ) to ( X, X ') when X s consdered a reference method has been used n the past to evaluate ndvdual boequvalence between a standard (reference) and a new formulaton of a drug 4 In that case, the focus was 5

6 on comparng the boavalabltes of the two formulatons. he comparson of ( X, Y ) to the mean of ( X, X ') and ( Y, Y ') where = E( X Y ) was ntroduced n a recent paper by Barnhart et al. 5 to assess the agreement between two or more measurement methods when there s no reference method. hey called the coeffcents (1) and () wth ( X, Y ) = E( X Y ) coeffcents of ndvdual agreement (CIA) as they were based on the dea used n assessng ndvdual boequvalence. It s mportant to realze that n these coeffcents the wthn-method dsagreement s used as a reference to whch we compare the between-methods dsagreement. Hence, before usng the ψ s one must check that the correspondng wthnmethod dsagreement,.e., ( X, X ') for ψ and both ( X, X ') and ( Y, Y ') for ψ, are acceptable. As wth other coeffcents, a common practcal queston s how good s good,.e. what values of the coeffcents are large enough to be consdered as ndcatng good agreement. o allow a better nterpretaton of the new coeffcents we suggest to look at ther recprocals, 1 /ψ and 1 /ψ. hese quanttes represent the relatve ncreases n dsagreement due to replacng X by Y or due to nterchangng X and Y. In order to clam good agreement, we beleve that the dsagreement between the two methods should not exceed the dsagreement between replcated observatons from the same method by more than 5%,.e., ψ 0.8. he dsagreement functons can also be defned at the subject level. Denotng ( X, Y ), ( X, X '), ( Y, Y ') the values of the dsagreement functons for subject, we can defne the followng subject-specfc agreement coeffcents: [ ( X, X ') + ( Y, Y ')]/ ψ, =, (3) ( X, Y ) 6

7 ψ ( X, X '), = ( X, Y ). (4) ote that ( X, Y ) may be zero. Snce n ths case there s good agreement, as measured by the dsagreement functon for ths subject, we defne the dsagreements coeffcents (3) and (4) as one regardless of the values of ( X, X ') and ( Y, Y '). Data: Snce the new coeffcents nvolve the evaluaton of dsagreement between measurements made wth the same method on the same subject, we have to use data wth replcated measurements. When we compare two methods (wthout a reference) we wll assume that at least two measurements are made usng each method on each subject. For the case of comparng a new method to a reference we only need replcated measurements on the reference method. he number of replcatons does not have to be fxed. We denote by X 1,..., X K the K measurements made on subject usng method X. Smlarly, we denote by Y 1,..., Y L the L measurements made on subject usng method Y. We assume that K, L and that these are unmatched replcatons,.e., f one permutes the order of the subscrpts for one of the methods wthout changng the order for the other method, then there s no change n the resultng coeffcents or ther estmates. We do not make any assumptons about the dstrbutons of the measurements. In ths work we wll consder two dsagreement functons related to agreement measures ntroduced by Ln et al. 6, namely the mean squared dfference (MSD), defned as the expectaton of the squared devaton between two measurements, and the complement of the coverage probablty (CP), whch s the probablty that two measurements are wthn a specfc dstance from each other. In addton, we consder a new dsagreement functon, the mean absolute dfference (MAD), as the absolute dfference s more meanngful than the squared dfference. 7

8 . he Mean Squared Devaton (MSD) In ths Secton we defne ( X, Y ) = MSD( X, Y ) = E( X Y ). We frst consder the dsagreements for a partcular subject, : ( X, Y ) = E[( X Y ) )], k ( X, X ') = E[( X X ) )] for k < k' k l k ' ( Y, Y ') = E[( Y Y ) )] for l < l' l l' We then obtan the overall dsagreement functons as ( X, Y ) = E [ ( X, Y )], ( X, X ') = E[ ( X, X ')], ( Y, Y ') = E[ ( Y, Y ')], where E stands for the expectaton over all study subjects. If we assume condtonal ndependence of X and Y gven the subjects characterstcs, then the dsagreement functons can also be wrtten n terms of the condtonal moments of X and Y: ( X, Y ) = E {[ E( X ) E( Y )] + Var( X ) Var( Y )}, k l k + ( X, X ') = E [ Var( X )], ( Y, Y ') = E [ Var( Y )]. k l l he overall agreement coeffcents (1) and () and the subject-specfc coeffcents (3) and (4) are now obtaned by substtutng these dsagreement functons nto the correspondng equatons. he overall coeffcents (1) and () wth = MSD are dentcal to the CIA and CIA coeffcents n Barnhart et al. 5 when there s no reference method and when method X s consdered as reference, respectvely. In addton, ψ MSD s the same as ψ n Haber and Barnhart 1. However, ψ n the earler paper compares MSD ( X, Y ) to ts expected value when E X ) E( Y ) whle the current ψ ( MSD compares MSD ( X, Y ) to the mean MSD of replcated measurements by the same method. hese approaches lead to the same coeffcent for = MSD, but they may result n dfferent coeffcents for other selectons of the dsagreement functon. 8

9 .1 Estmaton We begn wth the subject-specfc coeffcents. For subject, we estmate ˆ ( X, Y ) = Mean, k, l ( X k Yl ) ˆ k k ' k k ' ( X, X ') = Mean < ( X X ) = MSW ( X ), ˆ l l' l l' ( Y, Y ') = Mean < ( Y Y ) = MSW ( Y ), where MSW are the wthn-subject mean squares: = k k MSW ( X ) ( X X ) /( K 1), MSW ( Y ) ( Y Y ) /( L 1). We then substtute these estmates of the = l l Ĝ s nto (3) or (4) to obtan the estmated subject-specfc coeffcents of agreement. Based on the note underneath equaton (4), we set the estmated coeffcents to one whenever ˆ ( X, Y ) = 0. o estmate the overall coeffcents of agreement we estmate the overall dsagreement functon as ˆ ( X, Y ) = Mean [ ˆ ( X, Y )]. Lkewse, ˆ ( X, X ') and ˆ ( Y, Y ') are estmated as the means over all subjects of the correspondng subject-specfc dsagreements. he overall agreement coeffcent s estmated by substtutng the estmated s nto (1) or ().. An example o llustrate the coeffcents ntroduced n ths paper we use a data set from Bland and Altman 7. Systolc blood pressure (SBP) was measured on 85 subjects by two experenced human observers usng a sphygmomanometer and by a semautomatc blood pressure montor. hree replcatons were made n quck successon wth each of the three methods on each subject. We frst assessed the agreement between the two human observers. he agreement was excellent ψ ˆ =1.44, hence we wll focus here on the agreement between the frst human observer, whch we label X, and the montor, whch we label Y. Our estmated 9

10 s were ˆ ( X, X ') = , ˆ ( Y, Y ') = , ˆ ( X, Y ) = Hence, f we compare the two methods wthout consderng ether of them as a reference then the estmated agreement s ψ ˆ = However, t could be more approprate to consder the experenced human observer as a reference and the montor as a new method that s tested aganst ths reference. hen we estmate the agreement as ψ ˆ = Ether way, the agreement s poor. hs s not surprsng as the mean readng of the montor s about 16 ponts hgher than that of the human observer. hese estmates are dentcal to those obtaned by Barnhart et al. 5 usng an AOVA model. We used the bootstrap method to obtan standard errors and confdence ntervals (CI) for these coeffcents. he 95% CI s based on the bootstrap percentles for ψ and ψ were (0.110, 0.306) and (0.067, 0.07), respectvely. We also calculated the CI s based on the normal approxmaton to the dstrbuton of the estmates: (0.078, 0.78) and (0.037,0.183) respectvely. hus, the endponts of the percentle-based CI s exceed those of the normal CI s, whch suggests that the dstrbutons of the estmates are skewed. he CI s based on the normal approxmaton to the dstrbuton of the logarthms of the estmates are (0.105, 0.303) and (0.061, 0.198) respectvely, qute close to the percentage-based CI s. We can learn more about the agreement between X and Y by plottng the subjectspecfc coeffcents of agreement as functons of our best estmates of the subjects true SBP. When we dd not consder ether method as a reference, we, plotted the estmates of ψ aganst ( X + Y ) /, where X and Y are the means over the replcated observatons (Fgure 1a). When the human observer (X) was consdered a reference, we plotted the estmates of ψ, aganst X (Fgure 1b). Both coeffcents tend to ncrease as the SBP ncreases hus, the agreement between the two methods s better for ndvduals wth hgher SBP compared to those wth lower blood pressure. he plots also confrm the skewed dstrbuton of the estmates. 10

11 .3 Smulatons We conducted a smulaton study to nvestgate the bas and precson of the estmates of ψ and ψ for = MSD, usng a smple latent class model to generate the replcated condtonally ndependent measurements by the methods X and Y. he true value,, was assumed normal wth mean μ and standard devaton σ. Let = t denote the true value for subject. We then generated K replcated observatons by method X from ( X t ) ~ ( μ, σ ) and L k X t X t replcated observatons by method Y from ( Y t ) ~ ( μ, σ ). For l Y t Y t smplcty we assumed that the number of replcaton by each method s the same for all subjects, though the number of replcatons by X and Y may dffer. We assumed further that the above condtonal means and standard devatons are lnear functons of t : μ = a bt, μ = c dt, σ = e ft, σ = g ht. X t + Y t + X t + Y t + hen t s easy to show that the true value of the three functons for are: = MSD ( X, Y ) = ( a c) ( X, X ') = e + e + g + [( a c)( b d) + ef + 4efμ + f ( μ + σ ) ( Y, Y ') = g + 4ghμ + h ( μ + σ ). + gh)] μ + [( b d) + f + h ]( μ + σ ) he true values of ψ and ψ are now obtaned from (1) and (). Our purpose was to smulate data that are smlar to the SBP data used n the above example. We consdered the mean (over the three replcatons) readngs of the second human observer (whose observatons were not used n our example) n the Bland and Altman 7 data as the true values () of SBP. he sample moments of these true values wll be used as the moments of ; they are μ =17.3 and σ = We then obtaned the coeffcents n the equatons defnng the condtonal moments of X and Y gven by regressng the observed values of these condtonal moments on the true values. he coeffcents are: a = 1.03, b =1. 01, c = , d = 0. 85, e = 1. 91, f = 0. 03, g = 3. 6, 11

12 h = For ths choce of the parameters n our smulaton model, whch wll be referred to as case 1, the true values of the agreement coeffcents are ψ = 0.66, ψ = We also conducted smulatons wth larger true values of the agreement coeffcents by changng the values of c and d whle leavng the other smulaton parameters unchanged. For case we used c = 13, d = 0.95, whch yeld ψ = , ψ = 0. 50, and for case 3 we used c = 5, d = 0.98, whch yeld ψ = , ψ = (By brngng c closer to 0 and d closer to 1 we ncrease the agreement between Y and the true value. hs ncreases the agreement between X and Y because X agrees very well wth the true value.) hus, the three cases correspond to poor, far and good agreement, respectvely. We used three sample szes, n = two combnatons of ( K, L) 50, 100 and 00. For estmatng, namely (3,3) and (,), For estmatng ψ we used ψ we used four combnatons of ( K, L), namely (3,3), (,), (3,1) and (,1). hs allows to assess the mportance of usng replcated observatons on the new method, Y, snce ψ can be estmated from data wth replcated observatons on the standard method, X, only. able 1 presents the bas and root mean square error (MSE) of the estmates for all the combnatons of ( n, K, L) for the three cases defned earler. Each table entry s based on a set of 00 smulatons. We see that the bas s usually postve but very small. he MSE vares wth the sample sze and the number of replcatons. For n = 50, usng fewer than three replcatons wth ether X or Y may result n mprecse estmates. For n 100 the precson s acceptable even wth the mnmum number of replcatons ( K = L = for for ψ ). ψ and K =, L = 1 1

13 .4 Effect of between-subjects varablty on the agreement coeffcents he latent class model ntroduced earler can also be used to nvestgate the behavor of the agreement coeffcents n varous stuatons. One ssue that s commonly rased n connecton wth coeffcents of agreement s ther dependence on the between-subjects varablty of the quantty beng measured. herefore we explore the dependence of the agreement coeffcents on the varaton of the true SBP n the populaton,.e. on σ. Let us frst consder a specal case of the latent class model where the condtonal ( wthn subject ) varances of the measurements do not depend on the true value t,.e., f = h = 0. In ths case ( X, X ') and ( Y, Y ') do not depend on σ and ( X, Y ) s a monotoncally ncreasng functon of σ. herefore both ψ and ψ are decreasng functons of σ. In the general latent class model presented n Secton.3 all the three s ncrease wth σ. We used the values of the eght coeffcents ( a, b,..., h ) determnng the condtonal moments of X and Y gven t as estmated from the data (Secton.3) to examne the dependence of the agreement coeffcents on σ. For comparson, we also calculated the CCC for each case. he results are presented n Fgure. We see that the new agreement coeffcents decrease as the between-subjects varance ncreases. However, the rate of decrease s modest. On the other hand, the CCC ncreases at a very fast rate wth σ. Also, for σ 0 the CCC s consderably hgher than the new ps coeffcents. We repeated the calculatons of the ps s wth hgher values of f and h, to allow more varablty of the between-subject varance as the subject s true value t changes. However, the rate of decrease n the new agreement coeffcents wth σ dd not change. In our opnon, the decrease of the agreement coeffcents as the between-subject varance ncreases looks reasonable, as t s more dffcult for observers to demonstrate good agreement across a wder range of the subjects true values. 13

14 3. he Mean Absolute Devaton (MAD) In ths secton we consder the dsagreement functon ( X, Y ) = E X Y. he MAD s easer to nterpret as compared to the MSD, as t s more readly related to the actual observatons. In general, one would expect the to the square roots of the correspondng ψ MAD s to be close ψ MSD s. Estmaton of the agreement coeffcents based on the MAD s very smlar to the estmaton of the coeffcents based on the MSD. We begn wth the estmaton of the subject-specfc s: ˆ ( X, Y ) = Mean X Y, k, l k l ˆ ( X, X ') = Mean < X X, k k ' k k ' ˆ ( Y, Y ') = Mean < Y Y. l l' l l' We then calculate the sample means of the estmated s to obtan the overall estmates of the dsagreement functons and of the agreement coeffcents. We can also estmate the subject-specfc coeffcents of agreement. 3.1 SBP example For the SBP data dscussed n Secton., the estmated coeffcents wth = MAD are ψ ˆ = and ψ ˆ = he bootstrap percentle-based CI s are (0.348, 0.517) and (0.8, 0.459), respectvely, whle the bootstrap CI s based on the normal approxmaton are (0.341, 0.51) and (0.78, 0.448), respectvely. he dfferences between the two knds of confdence ntervals for the same parameter are smaller than they were for = MSD. In other words, the dstrbuton of the ψ MAD s s more symmetrc that that of the ψ MAD s. hs s also evdent from the plots of the subject-specfc ψ MAD s (not ncluded) whch show that the dstrbuton of these quanttes s less skewed. 14

15 3. Smulatons We used the same smulaton model and parameters as n Secton.3 he true values of the MAD s were calculated as means of folded normal varables. We now llustrate the calculaton of MAD ( X, Y ). Under our smulaton model the dfference D = X Y has a normal dstrbuton and D has a folded normal dstrbuton. hen MAD ( X, Y ) s the mean of ths dstrbuton: MAD( X, Y ) = E D = / π σ exp( μ / σ ) μ [1 Φ( μ / σ )], D where μ D and σ D are the mean and standard devaton of D and Φ s the standard normal dstrbuton functon 8. he frst two moments of D were obtaned from the frst two moments of ( X, Y ), whch were calculated from the parameters of the smulaton model as follows: E ( X ) = a + bμ, E Y ) = c + dμ Var Var Cov (, ( X ) e + efμ + f μ + ( b + f ) σ D =, ( Y ) g + ghμ + h μ + ( d + h ) σ =, ( X, Y ) bdσ =. he MAD ( X, X ') and MAD ( Y, Y ') were calculated n a smlar way. D D D D able presents the bas and MSE of the estmates of ψ and ψ for = MAD. he bas s usually negatve, whle the bas for the MSD-based estmates was usually postve. Comparng the MSE s of the MSD and MADbased estmates, we see that for case 1, where the true agreement s low, the MSE s of the MAD-based estmates are slghtly hgher than the correspondng MSE s of the MSD-based estmates. On the other hand, for cases and 3 where the true agreement s moderate or hgh, the MAD-based estmates have consderably smaller MSE s as compared to the MSD-based estmates. 15

16 4. Coverage Probabltes (CP) he coverage probablty for a gven value c > 0 was defned by Ln et al. 6 as CP ( c) = P( X Y < c). We defne the CP-based dsagreement functon as ( X, Y, c) = P( X Y c) = 1 CP( c). As before, we wll not make any assumptons on the dstrbutons of X and Y and estmate the overall agreement coeffcents from the subject-specfc estmates. he subject-specfc dsagreement functons are estmated from the proportons of parwse observatons that are separated by c unts or more: ˆ ( X, Y, c) = [# pars ( X k, Yl ) so that X k Yl c ]/ K L, ˆ ( X, X ', c) = [# pars X, ), k < k', so that X X ' c ]/ K ( 1), ( k X k ' k k K and ˆ ( Y, Y ', c) s obtaned n an analogous way. he overall dsagreement functons and agreement coeffcents are calculated from the estmates of the subject-specfc dsagreement functons n the same way as n Sectons and SBP example For the SBP data we used c = 10, whch means that dfferences of less than 10 ponts between two measurements on the same subject are consdered acceptable. Our estmates were ψ ˆ = , wth a bootstrap percentle-based CI (0.357, 0.547), and ψ ˆ = , wth a bootstrap percentle-based CI (0.304, 0.51). he CI s based on the normal assumpton were very close to the percentle-based CI s. he man drawback of the CP-based agreement coeffcents s ther dependence on the choce of c. In the SBP example, usng c = 5 rather than c = 10 ncreases the estmated ψ and ψ to and 0.615, respectvely. 16

17 5. Dscusson We presented a general approach to defnng agreement coeffcents for the comparson of two fxed observers or methods of measurement. he approach s based on the concept of dsagreement functon, whch allows the user to specfy her/hs crteron for assessng the agreement between observatons on the same subject. We used two approaches to standardze the value of the dsagreement functon n such a way that the ensung coeffcent wll be close to or above one when the magntude of the dsagreement s good. When nether of the measurement methods can be consder as a gold standard or a reference, we compare the dsagreement between the methods to the average dsagreement between replcated readngs from the same method. On the other hand, when one of the methods has been n use for a whle and can be consdered a gold standard (reference) then the between-methods dsagreement s compared to the dsagreement between replcated readngs from the reference method. In the absence of any parametrc assumptons, the estmaton of the new coeffcents requres replcated observatons by the same method on the same subject. (When one of the methods s consdered a gold standard then replcated observaton by the new method are not requred). Ideally, the methods or observers should be blnd to ther prevous assessment(s) of the same subject and the true value of the measured varable on a subject should not change between replcated evaluatons. We realze that t some cases t may be dffcult to obtan replcated observatons that satsfy these condtons. he new coeffcents are useful, for example, when the measurements are made by automatc devces on blood samples or x-ray sldes obtaned from each subject. When the methods are human observers t s mportant to make sure that each observer makes her/hs measurements n a random order, so that she/he s unlkely to recall the prevous measurements(s) on a gven subject. In most cases the dsagreement functon can be computed for each subject. hen one can estmate the agreement coeffcent for each subject and plot the estmated 17

18 coeffcents aganst the subjects estmated true values or aganst other subjectspecfc covarates. hese plots may shed more lght on the factors that affect agreement and help n dentfyng outlyng observatons. hey may also be used as a supplement to the tradtonal Bland-Altman plots 9 whch dsplay the dfference between measurements of two methods on the same subject as functon of the subjects estmated true value. In ths paper we dd not make any assumptons regardng the dstrbutons of the measurements. hus, we presented non parametrc estmates only. We also avoded some of the other assumptons commonly made when evaluatng observer agreement. For example, our approach allows the error varances to dffer across subjects, whch s a qute common phenomenon n realty. It s nterestng to compare the MSD-based coeffcent ψ wth the CCC. Both coeffcents have the MSD n the denomnator. he new coeffcent compares the MSD wth ts value under the assumpton that the two methods are nterchangeable, whle the CCC used the expected MSD under ndependence as a yardstck. One mportant dfference between these coeffcents s related to the fast ncrease n the CCC when the between-subject heterogenety ncreases (Fgure ). he new coeffcent, on the other hand, s less senstve to the sample heterogenety and, at least n the examples we explored, decreases when there s more heterogenety. Intutvely one would expect agreement to decrease when the measured varable exhbts more varablty because the ncreased varablty requres the methods to agree across a wder range of values of the measured varable. For further dscusson of agreement and heterogenety the reader s referred to Atknson & avel 10. We consdered three dsagreement functons, based on the MSD, MAD and CP. We saw that the CP-based coeffcents depend on the threshold used to defne good agreement, so we actually have to deal wth a famly of coeffcents correspondng to a gven range of threshold values. Comparng the MSD and 18

19 MAD-based approaches, we prefer the latter as the MAD s expressed n the same unts as the measured varable. In our smulaton studes the MAD-based estmates were more precse than the correspondng MSD-based estmates except when the true agreement was very small. he approach ntroduced n ths artcle can be generalzed n varous drectons. he MSD-based coeffcents were generalzed to stuatons where several new methods are compared to a common gold standard or several methods are compared wthout a gold standard 5. It wll be mportant to develop multplemethods coeffcents based on other dsagreement functons. We consdered only contnuous varables n ths work, and we are currently explorng smlar coeffcents for bnary and categorcal outcome varables. Fnally, throughout ths work we assumed that the replcatons from the two methods are ndependent of each other (unmatched replcatons). We found many examples where ths assumpton does not hold (for example, repeated measurements are performed at a set of fxed tme ponts or under fxed condtons). We plan to adapt our coeffcents to ncorporate matched replcatons and account for the varablty assocated wth the factor that underles the repeated measurements. Acknowledgement hs research was supported by IMH grant 1 01 MH

20 eferences 1. Haber M, Barnhart HX. Coeffcents of agreement for fxed observers. Statstcal Methods n Medcal esearch 006 (to appear). Zegers FE. A famly of chance-corrected assocaton coeffcents for metrc scales. Psychometrka 1986; 51: Ln L. A concordance correlaton coeffcent to evaluate reproducblty. Bometrcs 1989; 45: Schall, Luus H. On populaton and ndvdual boequvalence. Stat. Med. 1993; 1: Barnhart HX, Kosnsk AS, Haber M. Assessng ndvdual agreement. (submtted). 6. Ln L, Hedayat AS, Snha B, Yang M. Statstcal methods n assessng agreement: models, ssues and tools. J. Amer. Stat. Assoc. 00; 97; Bland JM, Altman D. Measurng agreement n method comparson studes. Statstcal Methods n Medcal esearch 1999; 8: ead CB. Folded Dstrbutons. Encyclopeda of Statstcal Scences (edted by S Kotz and L Johnson) Vol 3, pp Wley & Sons, ew York, Bland JM, Altman D. Statstcal methods for assessng agreement between two methods of clncal measurements. Lancet 1986; : Atknson, evll A. Comment on the use of concordance correlaton to assess the agreement between varables. Bometrcs 1997; 53:

21 able 1: Bas and root mean square error (MSE) of estmates of ψ and ψ usng the mean squared devaton (MSD) as the dsagreement functon ψ ψˆ Case 1 = 0.66, ψ = ψˆ ψ Case = 0.670, ψ = 0.50 ψˆ ψˆ ψ ψˆ Case 3 = 0.940, ψ = K L Bas MSE Bas MSE Bas MSE Bas MSE Bas MSE Bas MSE ψˆ

22 able : Bas and root mean square error (MSE) of estmates of ψ and ψ usng the mean absolute devaton (MAD) as the dsagreement functon ψ ψˆ Case 1 = 0.476, ψ = ψˆ ψ ψˆ Case = 0.804, ψ = 0.70 ψˆ ψ ψˆ Case 3 = 0.96, ψ = K L Bas MSE Bas MSE Bas MSE Bas MSE Bas MSE Bas MSE ψˆ

23 Fgure 1a: 3 Ψˆ based on MSD as a functon of the mean of X, Y.5 ˆ Ψ (MSD) Least squares lne coeffcents: α = , β = ; p-value for testng H : β 0 s = mean(x,y) 3

24 3 Fgure 1b: Ψˆ based on MSD as a functon of the mean of X.5 ˆ Ψ (MSD) Least squares lne coeffcents: α = -0.56, β = 0.049; p-value for testng H : β 0 s = mean(x) 4

25 5

Estimation of Coefficients of Individual Agreement (CIA s) for. Quantitative and Binary Data using SAS and R

Estimation of Coefficients of Individual Agreement (CIA s) for. Quantitative and Binary Data using SAS and R Estmaton of Coeffcents of Indvdual Agreement (CIA s) for Quanttatve and Bnary Data usng SAS and Y Pan Department of Bostatstcs and Bonformatcs ollns School of Publc Health Emory Unversty 58 Clfton oad