VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES. T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service

Size: px

Start display at page:

Download "VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES. T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service"

Bernard Dorsey
5 years ago
Views:

1 ~ E S [ B A U ~ L t L H E TI VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES RB-68-7 N T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service This Bulletin is a draft for interoffice circulation. Corrections and suggestions for revision are solicited. The Bulletin should not be cited as a reference without the specific permission of the authors. It is automatically superseded upon formal publication of the material. Educational Testing Service Princeton, New Jersey February 1968

2 Variability of Kuder-Richardson Formula 20 Reliability Estimates Abstract The standard error of a Kuder-Richardson Formula 20 reliability coefficient is derived and two approximations to it are presented. The values from the exact solutions and from the approximations are compared with empirical values.

3 Variability of Kuder-Richardson Formula 20 Reliability Estimates There are a number of different methods of estimating the reliability of a test; the more commonly used are the parallel-form correlation and internal-consistency measures such as the Kuder-Richardson formula 20 reliability coefficient. Each of these reliability estimates is based on particular assumptions and has a different interpretation; the choice of a reliability estimate in a given situation should be dictated by the interpretation. r~evertheless, it is instructive to consider the standard errors of these indices or the sampling standard deviations of the inqices under Type I Type I examinees: sampling fluctuation. sampling was used by Lord (1955) to refer to sampling of the same test is administered to a large number of separate groups of examinees, each group being a random sample from a population of examinees. Under Type I sampling the standard error of a parallel-test correlation is well known: P xx (1) where r is the observed correlation between two parallel tests, Pxx xx is the correlation between the parallel tests in the population, N is the number of persons in the sample. Feldt (1965) has derived an approximation to the sampling distribution of r 20, but does not state the standard error explicitly, Lord (1955) gives an explicit formula for the standard error under Type II (sampling of items) but not Type I sampling (sampling of persons)"

4 -2- Reliability can be defined as, where is the variance of the true scores and (J2 X is the variance 0f the observed ~cores. In deriving the KR-20 formula from the analysis-of-variance model (Hoyt, 1941; Feldt, 1965), the score of person p (p = 1,...N) on the item i (i ::: 1,...K) is represented as X. ::: M + A. + t + e. pl l P pl where M::: grand mean, Ai ::: the score component due to the difficulty of item i t p the item true score for person p, and e. ::: error for item :i. and person p, pl The item errors, e., are assumed to be normally and independently displ tributed with zero means and ~ommon variance (J2 The item true score, e t, is assumed to have a normal distribution with variance p ~. The test true score, T :::: Kt, and test error score, p are then normally distributed with variances E:::L:e., i pl. and 2 2 (J K o' E e

5 -3- Reliability is then defined in terms of item parameters as p '02 t The variance of the true score component, t,is estimated by p MS p - MS 1P K ; the variance of the item error score by I~Ip/K and the reliability by MS 1P MS p where MS 1P is the mean square for the items-by-persons interaction, and MS p is the mean square for persons. The expected values of the mean squares are: =C? e and The population covariance of MS 1P and NS p is zero, The sampling distributions of the mean squares are known: (N - 1) MS p 2 2 CT +KCT e t is distributed as chi-square...,ith (N - 1) degrees of freedom and (N - 1) (K - 1) MS 1P c? e is distributed as chi-square with (N - 1) (K - 1) degrees of freedom.

6 -4- Since these two chi-square variates are independent, the ratio 1 - r P is distributed as a central F with (N - l)(k - 1) and (N - 1) degrees of freedom. The variance of r 20 can then be ITitten c = (1 _ )2 2(N - l)((n - 3) + (N - l)(k - 1)) P (K - l)(n - 3)2(N - 5), and =(l-p) PCN-l)[(N-3)+(N-l)(K-l») /- (K - l)(n - 3)2(N - 5) (2) An approximation to this variance is obtained by considering the variance of a ratio of the two chi-square variates. Since the variance of a chi-square distribution is equal to tttlice the number of degrees of freedom, the sampling variances of the mean squares are: 2 4 Var (MS 1P) = (N - l)(k - 1) ~e and The variance of the ratio of two random variables, X 1 /X 2, is approximately equal to (see Kendall & Stuart, 1958, p. 232, Eq ): 2 cov

7 -5- The standard error of is then: ( ).: ( ) I 2K S. E. r P ~17(I~~-'"'="1"'T')+'(K:-=----:-l"'<'") A still cruder approximation S. E. (r 20 ) - (1- p)~ (4 ) is obtained by assuming that K - 1 is approximately equal to K. For tests of typical length, formula (4) will give results similar to those of formula (3), Baker (1962) conducted an empirical study of sampling distribution of some common test analysis statistics, including the KR-20 coefficient. Using a population of 747 answer sheets of an So-item test, Baker drew 200 Type I samples, with replacement, for each of four different sample sizes (N = 15, 30, 60, 120). The resulting standard deviations of the observed sample KR-20 coefficients are reported in Table 1 along with the theoretical standard errors obtained by using formulas (2), (3), and (4). As can be seen, there is a fairly close agreement among the sets of values which improves as the sampling s j_ze increases Insert Table 1 about here It is of some interest to note that for a parallel-test correlation with the same population value (p = <906) the theoretical standard xx errors for the four cases in Table 1 are.044,.031,.022, and.016 for an N of 15, 30, 60, and 120 respectively~ These values are all higher

8 /' -0- than the corresponding standard errors of P20 except for the smal.lest N (N = 15) A comparison of formulas (1) and (2) indicates that for the standard error of I rill generally be smaller than the standard error of for values of p, N, and K that are apt to be encountered in practice. It should be noted that the standard. test errors given above should not be used for confidence limits. The sampling distribution of the sample reliability coefficient is skevred and it is a biased estimator of the population value.

9 -7- References Baker, F. B. Empirical determination of sampling distribution of item discrimination indices and a reliability coefficient. Wisconsin: University of Wisconsin, Madison, Feldt, Lo S. The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty, Psychometrika, 1965, 30, HOJ~' C. Test reliability estimated by analysis of variance. Psychometrika, 1941, ~, Kendall, M. G. & stuart, A. Advanced theory of statistics, Vol. 1. London: Charles Griffin & Co., Ltd., Lord, F. M. Sampling fluctuations resulting from the sampling of test items. Psychometrika, 1955, g, 1-22

10 -8- Table 1 Empirical and Theoretical Standard Errors of Kuder-Richardson Formula 20 for an &)-Item Test with P20 =.906. Theoretical Results Sample Empirical Size Results a Formula 2 Formula 3 :F'ormula :;0.028, ) , aempirical results are based on 200 samples from a finite population of 747 test scores (Baker, 1962).

A TEST OF SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED PROPORTIONS. John A. Keats

~ E S E B A U ~ L t L I-i E TI RB-55-20 A TEST OF SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED PROPORTIONS John A. Keats N This Bulletin is a draft for interoffice circulation. Corrections and suggestions