RR R E E A H R E P DENOTING THE BASE FREE MEASURE OF CHANGE. Samuel Messick. Educational Testing Service Princeton, New Jersey December 1980

Size: px

Start display at page:

Download "RR R E E A H R E P DENOTING THE BASE FREE MEASURE OF CHANGE. Samuel Messick. Educational Testing Service Princeton, New Jersey December 1980"

Gwenda Tabitha McGee
5 years ago
Views:

1 RR R E 5 E A RC H R E P o R T DENOTING THE BASE FREE MEASURE OF CHANGE Samuel Messick Educational Testing Service Princeton, New Jersey December 1980

2 DENOTING THE BASE-FREE MEASURE OF CHANGE Samuel Messick Educational Testing Service Princeton, New Jersey December 1980

4 ABSTRACT Denoting the Base-Free Measure of Change Samuel Messick Educational Testing Service Bond criticized the base-free measure of change proposed by Tucker, Damarin, and Messick by pointing to an incorrect derivation which is here viewed instead as a correct derivation entailing an inadequately specified tacit assumption. Bond's revision leads to estimates of the correlation between initial position and change which are negatively biased by correlated errors, whereas the original approach, with the tacit assumption properly denoted, leads to unbiased values. Key words: base-free measure of change, difference scores, change scores, measuring change.

5 Denoting the Base-Free Measure of Change Samuel Messick Educational Testing Service Bond (1979) pointed out a presumably incorrect derivation in the Tucker, Damarin, and Messick (1966) presentation of a base-free measure of change. As a consequence, Bond argued that properties attributed to a (the coefficient for the regression of true scores from the second testing on true scores from the first testing) were instead properties of b, the corresponding regression coefficient for observed scores. At issue was the correlation between initial position () and change (d = XZ-XI), with Tucker et al. (1966) claiming that P d was positive when a > 1, was zero when a = 1, and was negative when a < I and Bond insisting that these cut-off points were determined by b rather than a. In reply, Tucker (1979) emphasized that the focus should have been on the correlation between true initial position (tl) and true change (6 = tz-tl) all along, whereby the interpretation returns to a comparison of a with unity since (1) Equation (1) is algebraically equivalent to the formula provided by Zieve (1940) for the correlation between true initial position and the true difference score, which also appears as equation (62) in Tucker et al. (1966), Zieve's formula explicitly corrects P d for correlated errors as well as for attenuation. The derivation error in question revolves around whether P xlxl in the formula for P d represents the reliability of, as Tucker et al. tacitly

6 -2- assumed, or the self-correlation of (i.e., unity) as Bond properly asserted. In the absence of further specifications, Bond was correct. But once the nature of the Tucker et al. tacit assumption is explicitly denoted, we see that a remains a proper benchmark regardless of whether the observed correlation or the true correlation is considered. Using the Tucker et al. (1966) notation which was also followed by Bond (1979), as well as standard relationships from classical test theory, the basefree measure of change (y) is defined as a true independent change score: (2) (3) a = P a xl x2 X2 p a xlxl xl b Formula (61) in Tucker et al. (1966) and formula (14) in Bond (1979) give the correlation between an observed difference score and any other variable k, (4) PX2kaX2 - Pka ad Now t instead of setting k = xl as Tucker et a1. did originally, leading Bond to question whether p xlxl should not properly be unity and hence (5) (b-d) o xl let k = xl, a parallel form of xl' The use of parallel forms of the pretest, one to determine the difference score and one to correlate with the difference score, eliminates the negative bias in P d produced by correlated errors (Thorndike, 1966), Equation (4) then becomes, with inserts from equation (3),

7 -3- (6) Equation (6) is algebraically equivalent to equation (63) in Tucker et al. (1966), which gives the correlation between observed initial position and the observed difference score corrected for correlated errors. It is clear from equation (6) that PX'd is positive when a > 1, is zero when a = 1, and is negative when a < 1. I Furthermore, if both sides of equation (6) are divided by (PxlxI)~(Pdd')~,where P dd, is the reliability of the observed difference score, then (7) = = which is consistent with Tucker's (1979) formulation. Thus, P t l o is Pxid corrected for attenuation in both xi and d. As previously noted, it is clear from equation (6) that P 'd--the correlaxl tion between observed initial position and observed change uncontaminated by correlated errors--is positive when a > 1, is zero when a = 1, and is negative when a < 1. It may be helpful in explicating these relationships to consider a as the product of two ratios (8) The first ratio is typically less than unity and will usually become smaller and smaller as the time interval between the first and second testing increases. Therefore, a > 1 and P, is positiv~ only when xld

8 -4- (9) (J x2 This ~ondition of (J becoming increasingly larger than (J as the testing inx2 xl terval increases corresponds to a growth model known as the fan-spread hypothesis (Bryk & Weisberg. 1977). Further. a = 1 and Px'd 1 o when (10) ---= This condition corresponds to a growth model known as the overlap hypothesis (Anderson. 1939). where XL consists of everything present at xl plus an independent increment uncorrelated with xl' In the overlap model of growth. the squared correlation coefficient (corrected for attenuation) between test scores at two points in time is sometimes interpreted as indicating the "percent of the elements" in the second measurement that had been attained by the time of the first measurement (Anderson p. 365). In this view of longitudinal data. the proportion of common elements or the amount of overlap in two measurements taken at different points in time is indexed not only by the squared unattenuated correlation between the two tests or by the ratio of the two test variances as in equation (10), but by the ratio of the two test means (Bloom. 1964). Such reasoning enabled Bloom (1964). after reviewing empirical evidence that P d averaged about zero for mea sures of intelligence (at least through age seven). to move from statements about variance to statements about amount--that is. instead of stating that 50 percent of the variance of adult intelligence is accounted for by age four. he claimed that "about 50% of the development takes place between conception and age 4" (P. 88).

9 -5- This is a strong conclusion, but the overlap model of growth undergirding it implies the very special condition of a = 1. Finally, a < land p 'd is negative when xl (11) a x2 which occurs so frequently in situations having test ceiling effects or in systems having natural boundary conditions, as in the measurement of psychophysiological responses, that its corresponding growth model has been called the law of initial values (Lacey & Lacey, 1962). These differences in possible conceptions of the nature of growth occurring in a particular instance hinge on the values of a or.p 'd--not, as Bond (1979) averred. on b or p d. These latter values incorporate correlated errors and hence are negatively biased. It can be seen from equation (3) that the difference between using a or b to specify analytically the conditions under which the correlation between initial position and change will be positive. negative, or zero becomes increasingly important to the extent that the initial measure is fallible. Since unreliability of measurement is a pervasive problem during periods of rapid growth as in early childhood, the distinction between a and b is especially germane there. but it is also important in any investigation of the correlates of change using fallible measures (Tucker et al., 1966).

10 REFERENCES Anderson, J. E. The limitation of infant and preschool tests in the measurement of intelligence. Journal of Psychology, 1939, ~, Bloom, B. S. Stability and change in human characteristics. New York: Wiley, Bond, L. On the base-free measure of change proposed by Tucker, Damarin, and Messick. Psychometrika, 1979, 44, Bryk, A. S. & Weisberg, H. I. Use of the nonequiva1ent control group design when subjects are growing. Psychological Bulletin, 1977, 84, Lacey, J. I. & Lacey, B. C. The law of initial value in the longitudinal study of autonomic constitution: Reproducibility of autonomic responses and response patterns over a four year interval. In W. M. Wolf (Ed.), Rythmic functions in the living system. Annals of the New York Academy of Sciences, Thorndike, R. L. Intellectual status and intellectual growth. Journal of Educational Psycho1o~, 1966,22, Tucker, L. R. Comment on a note on a base-free measure of change. Psychometrika, 1979, 44, 357. Tucker, L. R., Damarin, F., & Messick S. A base-free measure of change. Psychometrika, 1966, 1!, Zieve, L. Note on the correlation of initial scores with gains. Journal of Educational Psychology, 1940, 1!,

Estimating Operational Validity Under Incidental Range Restriction: Some Important but Neglected Issues

Estimating Operational Validity Under Incidental Range Restriction: Some Important but Neglected Issues A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute