Meiser et. al.: Latent Change in Discrete Data 76 as new perspectives in the application of test models to social science issues (e.g., Fischer & Mole

Size: px

Start display at page:

Download "Meiser et. al.: Latent Change in Discrete Data 76 as new perspectives in the application of test models to social science issues (e.g., Fischer & Mole"

Nigel Cain
5 years ago
Views:

1 Methods of Psychological Research Online 1998, Vol.3, No.2 Internet: Latent Change in Discrete Data: Unidimensional, Multidimensional, and Mixture Distribution Rasch Models for the Analysis of Repeated Observations Thorsten Meiser Psychologisches Institut der Universität Bonn Elsbeth Stern Max-Planck-Institut für Bildungsforschung Berlin Rolf Langeheine Institut für die Pädagogik der Naturwissenschaften an der Universität Kiel Abstract A survey of unidimensional, multidimensional, and mixture distribution Rasch models is presented with a particular focus on model applications for the analysis of change in repeated measures designs. A mover-stayer mixed Rasch model is specified for modeling global change in one of two latent subpopulations and for modeling stability in the other latent subpopulation. The application of unidimensional, multidimensional, and mixture distribution Rasch models for the analysis of change is illustrated using data on the development of understanding and solving arithmetic word problems in elementary school children. Keywords: Rasch model, measurement of change, finite mixture distributions Zusammenfassung Eine Übersicht über eindimensionale, mehrdimensionale und Mischverteilungs- Rasch-Modelle wird vorgestellt unter besonderer Berücksichtigung der Modellanwendung zur Analyse von Veränderungen in Designs mit Meßwiederholungen. Ein Mover-Stayer-Mischverteilungs-Rasch-Modell wird formuliert zur Modellierung globaler Veränderung in einer von zwei latenten Subpopulationen und zur Modellierung von Stabilität in der anderen latenten Subpopulation. Die Anwendung von eindimensionalen, mehrdimensionalen und Mischverteilungs-Rasch-Modellen zur Analyse von Veränderungen wird anhand von Daten zur Entwicklung des Verstehens und Lösens mathematischer Textaufgaben bei Grundschulkindern illustriert. Schlüsselwörter: Rasch-Modell, Veränderungsmessung, finite Mischverteilungen The class of psychometric models presented by Georg Rasch (1960/1980, 1968) has gained considerable interest and has stimulated an impressive amount of research on statistical models in the social and behavioral sciences. The impact of Rasch's work on modern test theory is documented in several monographs which summarize current developments in the mathematical modeling of test data as well

2 Meiser et. al.: Latent Change in Discrete Data 76 as new perspectives in the application of test models to social science issues (e.g., Fischer & Molenaar, 1995; Langeheine & Rost, 1988; Rost & Langeheine, 1997; van der Linden & Hambleton, 1997). The present article is devoted to the application of Rasch models to longitudinal test data which comprise repeated observations of the same items and the same sample of individuals at different occasions. For this purpose, the unidimensional Rasch model and some recent extensions, such as multidimensional Rasch models, mixture distribution Rasch models, and submodels thereof, are outlined in the next three sections with a particular focus on the analysis of change. With respect to the loglinear representation of unidimensional and multidimensional Rasch models, a model hierarchy is pointed out which facilitates testing the assumption of homogeneity of change across individuals. In the context of mixture distribution Rasch models, a mover-stayer model is presented which allows an a-priori specification of different patterns of change for different latent subpopulations. Throughout the text, the models are presented in terms of polytomous item response models which include models for dichotomous items as special cases. In regard to polytomous Rasch models, two contradictory views concerning the ordering of threshold difficulties are described and a mediating view is suggested. In the fourth section, the analysis of change by means of unidimensional, multidimensional, and mixture distribution Rasch models is illustrated using longitudinal data on the development of understanding and solving arithmetic word problems in elementary school children. Thereby the fourth section extends a previous analysis of these data which was based on a latent class state-mastery model embedded in latent Markov chain models (Langeheine, Stern, & van de Pol, 1994). 1 Unidimensional Rasch Models Unidimensional Rasch models for polytomous items with ordered response categories x = 0; :::; m i can be derived by the appropriate parameterization of the threshold probabilities. The threshold probability of item i, i 2f1; :::; Ig, threshold x, x 2f1; :::; m i g, and person v, v 2f1; :::; Ng, is defined as the conditional probability of person v responding with response category x in item i, given that person v responds with either category x 1orx: fl vix = p(x vi = x) p(x vi = x 1) + p(x vi = x) : (1) For unidimensional Rasch models, the threshold probabilities fl vix are parameterized in terms of the logistic function, where the argument of the function is the difference between a person parameter v and a threshold parameter fi ix : fl vix = exp( v fi ix ) 1 + exp( v fi ix ) : (2) The parameter reflects the latent ability or attitude of person v, whereas the parameter fi ix reflects the difficulty of threshold x of item i. From Equation (2), the probability of response category x for item i and individual v can be derived by a recursive formula: = fl vix p(x vi = x) = p(x vi = x 1) (3) 1 fl vix = ::: P x exp (x v s=1 fi is) P mi y=0 exp (y v P y s=1 fi is)

3 Meiser et. al.: Latent Change in Discrete Data 77 (Andrich, 1978; Masters, 1982; Rost, 1988) with P 0 s=1 fi is := 0. The probability of response vector X containing the responses to a given set of I items, X= (X 1 ; :::; X I ), results from multiplying the probabilities of the single item responses, that is multiplying Equation (3) over i: p (X v =(x 1 ; :::; x I )) = exp Q I i=1 P mi t v P I i=1 P xi s=1 fi is P y=0 exp (y y v s=1 fi is) ; (4) where t = P I i=1 x i denotes the total score of the item responses. Equation (4) rests on the assumption of local independence, that is stochastic independence of responses conditional on the person and threshold parameters. Several special cases of the general unidimensional Rasch model for polytomous items, which is also called the partial credit model (PCM; Masters, 1982; Masters &Wright, 1997), can be specified by restrictions on the threshold parameters fi ix. Equality constraints on the differences between threshold parameters across items, fi ix = ff i + ψ x with mx s=1 ψ s =0; (5) result in the well-known rating scale model (RSM; Andrich, 1978). Equality constraints on the differences between adjacent threshold parameters within each item, result in the dispersion model (Andrich, 1982). fi ix = ff i +(x (m i +1)=2) ffi i ; (6) 1.1 Interpretation of Threshold Parameters Since the parameterization of the threshold probabilities in Equation (2) is equivalent to the unidimensional Rasch model for dichotomous items, the threshold parameters fi ix have the same interpretation as the item parameters in the dichotomous Rasch model. In particular, the following relations hold: ffl v = fi ix, fl vix = :5, ffl fi ix is the turning point ofthethreshold characteristic curve, and ffl v = fi ix, p(x vi = x 1) = p(x vi = x). A controversial issue concerns the question whether the claim of ordered response categories implies a corresponding order on the threshold parameters within each item, that is: does the assumption that response category x of item i reflects a higher amount of latent ability or attitude than response category x 1, for x =1; :::; m i, necessarily imply the order relation fi i1 <fi i2 < ::: <fi imi? In recent publications, Masters and Wright (1997) and Andrich, de Jong, and Sheridan (1997) elaborated their contradictory views on this issue. Masters and Wright (1997) pointed out that each threshold parameter refers to the comparison of a single pair of adjacent categories (see Equation (2)) and that the parameters can therefore have any order: Because each item parameter [i.e., threshold parameter in the present terminology] in the PCM is defined locally with respect to just two adjacent categories (rather than taking into account all categories simultaneously), the item parameters in the model can take any order. (Masters & Wright, 1997, p. 105)

4 Meiser et. al.: Latent Change in Discrete Data 78 An example for an achievement item with four different response categories, one for entirely incorrect responses, two for partial solutions, and one for the correct solution, is the math item p 5 1 =? 4 The solution of the item requires three consecutive steps: = 4, 2. p 4 = 2, 3. 2=4 = 0:5. If no step is taken successfully, the response is entirely incorrect (ignoring the possibility of guessing the solution) and scored as x = 0. Finishing only step 1 or finishing steps 1 and 2 leads to the interim results 4" or 2" respectively, scored as categories x = 1 and x =2. Finishing all steps successfully leads to the correct solution 0.5" which is scored as category x = 3. Since threshold 2 (i.e., scoring 2, rather than 1) requires taking a square root and therefore may beregarded as more difficult than threshold 3 (i.e., scoring 3, rather than 2) which involves solving a simple fraction, the natural order of threshold parameters for this task would be fi i1 <fi i3 <fi i2. Thus, the order of threshold parameters does not correspond to the order of response categories x. However, this does not violate the basic assumption that response category x = 3 reflects more effort or ability than does category x = 2, unless the division required in step 3 is considered to be trivial. In contrast to the view of Masters and Wright (1997), Andrich et al. (1997) emphasized that the effects of the threshold parameters are not confined to adjacent categories and that each category probability ofitem i depends on the entire set of threshold parameters of that item (see Equation (3)). In particular, the authors regard the ascending order of threshold parameters as an implication of the hypothesis of ordered response categories and, as a consequence, as a criterion for model evaluation: The ordering of the thresholds which divide the latent unidimensional continuum into categories itself is a hypothesis about the data which is embedded in the model. Although formal statistical tests of fit could be constructed to test that the empirical ordering is not consistent with the intended ordering, reversed threshold estimates is sufficient evidence to conclude that the empirical ordering is not consistent with the intended ordering. (Andrich et al., 1997, p. 68) The rationale behind this view is based on the presumed process of generating a response to an item in terms of integrating single responses to each of the thresholds (Andrich, 1978, 1985): considering the possible patterns of responses to the two thresholds of a trichotomous item, for instance, yields the sample space Ω = f(0; 0); (1; 0); (0; 1); (1; 1)g, where (0,0) means that neither of the thresholds is passed, (1,0) means that only the first threshold is passed, etc. The requirementtointegrate the threshold responses to an unique response on the item reduces the sample space of the item to Ω 0 = f0 =(0; 0); 1=(1; 0); 2=(1; 1)g so that the pattern (0; 1) in Ω has to be dropped from the set of available response pairs. As a consequence, the assumption of ordered response categories implies a Guttman structure on the threshold responses within each item reflecting the order fi i1 <fi i2 < ::: <fi imi. This brief reflection of the contradictory views reveals that the hypotheses about the response generating processes are crucial to the application of a psychometric

5 Meiser et. al.: Latent Change in Discrete Data 79 model and should be made explicit and underpinned by substantial theory. As a mediating view on the issue, we suggest that violations of the Guttman structure in the ordering of threshold parameters may be accepted for achievement items, if the cognitive processes involved in the single steps of a task are well specified and if an additional cognitive operation, resulting in a higher response category, indicates some degree of effort or ability on the latent continuum to be measured (cf. Masters & Wright, 1997). In contrast, applications of Rasch models to attitude items may well include the assumption of ascending threshold difficulties, because individuals have to select the appropriate category by taking into consideration the whole set of available categories at once (cf. Andrich et al., 1997). Hence, for attitude items the ordering of the response categories is a hypothesis which can be tested by means of the ordering of threshold difficulties. 1.2 Linear Logistic Test Models for Measuring Change Fischer and Parzer (1991) and Fischer and Ponocny (1994) introduced polytomous Rasch models with linear constraints on the threshold parameters which allow measuring change and (quasi-) experimental treatment effects in repeated observations (see Fischer& Ponocny, 1995, for an overview). These linearly restricted polytomous Rasch models are generalizations of the linear logistic test model (LLTM) for the assessment of change with dichotomous items (Fischer, 1983, 1995; Fischer & Formann, 1982). As before, only the polytomous case is considered here, because it encompasses the application to dichotomous items as a special case. If a set of I items is assessed at two measurement occasions T 1 and T 2, the resulting set of 2 I items can be divided into the subset of I items 1; :::; I observed at T 1 and a further subset of I virtually new items I +1; :::; 2I observed at T 2. In the linearly restricted PCM (Fischer& Ponocny, 1994), the thresholds of the first I items are parameterized in terms of Equation (2), while the threshold parameters of the virtually new items I + 1; :::; 2I are decomposed into the initial threshold parameters and a set of J treatment effects: fi (I+i)x = fi ix JX j=1 q j j (7) In Equation (7), q j denotes the dosage of treatment j, which may differ between experimental groups, and j denotes the effect of treatment j. By means of replacing the effect parameter j by an item-specific effect parameter ji, the assumption of constant treatment effects across items is dropped. Analogous linear decompositions have been proposed for the parameters of the RSM (Fischer & Parzer, 1991). Methods of conditional maximum-likelihood (CML) estimation of the parameters in the linearly restricted PCM and RSM are presented by Fischer and Parzer (1991) and Fischer and Ponocny (1994, 1995). If no experimental treatments are applied, development occuring in the interval from T 1 to T 2 can be measured by setting J = 1 and q 1 =1. Then 1 reflects the individuals' global amountofchange on the latent continuum (cf. Langeheine, 1993; Rost & Spada, 1983; Spada & McGaw, 1985). Generalizations to more than two measurement occasions are straightforward. The linear logistic test models for dichotomous and polytomous items maintain the assumption of unidimensionality across items. Another family of models, called linear logistic test models with relaxed assumptions (LLRA), abandons this condition (Fischer, 1983, 1995; Fischer & Formann, 1982). In LLRA models, each of the items 1; :::; I may measure a trait of its own, although all items are presumed to assess the same kind and amount of change. The relaxation is accomplished by replacing the difference of the person and threshold parameter, v fi ix, by a

6 Meiser et. al.: Latent Change in Discrete Data 80 joint parameter which denotes the interaction of both, vix Λ. For the virtual items I +1; :::; 2I, the joint term Λ v(i+i)x is decomposed into the initial parameter Λ vix plus a linear combination of change parameters as in the LLTM (see Equation (7)). While LLRA models are multidimensional in nature, they do not contain explicit hypotheses about the latent dimensions underlying the responses to any particular item. Alternative multidimensional Rasch models which allow specifying the latent structure of each threshold are presented in the next section. 2 Multidimensional Rasch Models If it is assumed that the observed responses are affected by more than one latent trait, the threshold probabilities can be parameterized in terms of the logistic function where the argument involves the sum over several latent dimensions d, d =1; :::; D: PD exp d=1 w ixd( vd fi ixd ) fl vix = PD : (8) 1 + exp d=1 w ixd( vd fi ixd ) The weights w ixd 2 f0; 1g specify whether passing threshold x of item i involves latent trait d or not (for details, see Meiser, 1996). The category probabilities can be derived from Equation (8) by the recursive formula used in Equation (3): p(x vi = x) = exp PD d=1 P x s=1 w isd vd P D d=1 P mi y=0 exp P D d=1 P y s=1 w isd vd P D d=1 P x s=1 w isdfi isd P y s=1 w isdfi isd (9) with fi isd = 0 for w isd = 0. As in the unidimensional case, the distribution of response vector X results from multiplication over i: p (X v =(x 1 ; :::; x I )) = PD exp P d=1 t D P I P xi d vd d=1 i=1 Q I i=1 P mi y=0 exp P D d=1 P y s=1 w isd vd P D d=1 s=1 w isdfi isd P y s=1 w isdfi isd ; (10) P I P where t d = xi i=1 s=1 w isd denotes the total score related to latent dimension d, that is the number of thresholds passed involving this dimension. Special cases can be derived from the general multidimensional Rasch model by appropriate specifications of the weights w ixd. The so-called multidimensional partial credit model in which each threshold x of items 1 to I refers to a latent dimension of its own (Kelderman, 1993, 1996; Kelderman & Rijkes, 1994) results from w ixd = ( 1 for x = d; 0 for x 6= d: (11) By this specification, Equation (9) is reduced to: p(x vi = x) = exp (P P x d=1 x vd P mi y=0 (P y d=1 vd P y d=1 fi id) d=1 fi id) : (12)

7 Meiser et. al.: Latent Change in Discrete Data 81 Recently it has been shown that the multidimensional partial credit model for ordered response categories is equivalent to Rasch's traditional multidimensional model for categorical responses with category probabilities p(x vi = x) = exp ( vx fi ix ) P mi y=0 ( vy fi iy ) (13) (cf. Andersen, 1995; Roskam, 1996). Actually, reparameterizing Model (12) in terms of Λ vx = xx d=1 yields Model (13) (Kelderman, 1997). vd and fi Λ ix = xx d=1 fi id (14) 2.1 Loglinear Representation of Multidimensional Rasch Models Loglinear representations of unidimensional Rasch models (e.g. Cressie & Holland, 1983; Kelderman, 1984; Thissen & Mooney, 1989; Tjur, 1982) and of multidimensional Rasch models (e. g. Kelderman & Rijkes, 1994; Meiser, 1996) provide a convenient framework for specifying and testing hypotheses about the latent space underlying the observed responses and about the structure of threshold parameters. The general multidimensional Rasch model of Equation (10) can be rewritten in terms of the loglinear model ln p (X =(x 1 ; :::; x I )) = u DX IX Xx i d=1 i=1 s=1 w isd fi isd + u (t1;:::;t D) ; (15) where (t 1 ; :::; t D ) is the vector of total scores referring to the latent dimensions 1 to D. To achieve identifiability, several constraints have to be imposed on the parameters of Equation (15) (cf. Kelderman & Rijkes, P 1994; P Meiser, 1996). A common set of constraints is fi isd =0for w isd =0, t 1 ::: t D u (t1;:::;t P D) =0,and the restriction of centered scales for the latent dimensions, that is I P mi i=1 s=1 fi isd =0 for d =1; :::; D. Thereby, the model for the distribution of the total score vector (t 1 ; :::; t D ) is saturated in Equation (15), and maximum-likelihood estimation of the parameters in the loglinear model yield CML estimates of the threshold parameters. The unidimensional Rasch model, the multidimensional partial credit model, and models with several unidimensional scales can be derived from Model (15) by appropriate selections of the design matrix in the nonstandard loglinear modeling approach (cf. Langeheine, 1983; Rindskopf, 1990, 1992). 2.2 Loglinear Rasch Models for the Analysis of Change Consider again the case of two measurement occasions T 1 and T 2 with a set of items 1; :::; I observed at T 1 and with the corresponding set of virtually new items I +1; :::; 2I observed at T 2. The above-mentioned special case of the linearly restricted PCM (Fischer& Ponocny, 1994), namely the unidimensional Rasch model of global change which is independent from the thresholds, the items, and the person parameters, results from Model (15) by the specifications D =1,w is1 = 1 for all thresholds, and fi (I+i)s = fi is : ln p (X =(x 1 ; :::; x I ;x I+1 ; :::; x 2I )) =

8 Meiser et. al.: Latent Change in Discrete Data 82 u IX Xx i i=1 s=1 fi is IX i=1 x X (I+i) s=1 fi is + IX i=1 x X (I+i) s=1 + u t (16) P P I with t = i=1 x I i + i=1 x (I+i). The statistical comparison of Model (16) to the model of perfect stability which results from the restriction = 0 allows testing for the occurrence of change from T 1 to T 2. Model (16) can easily be extended to measuring global change in several latent dimensions or to designs with more than two measurement occasions (see Meiser, 1996). While the unidimensional Rasch model for repeated observations is based on the assumption of homogeneity ofchange across individuals, the two-dimensional Rasch model with latent trait d = 1 affecting only responses at T 1, that is responses to the items 1; :::; I, and latent trait d = 2 affecting only responses at T 2, that is responses to the virtually new items I +1; :::; 2I, ln p (X =(x 1 ; :::; x I ;x I+1 ; :::; x 2I )) = u IX Xx i i=1 s=1 fi is IX i=1 x X (I+i) s=1 fi is + u (t1;t 2) (17) P P I with t 1 = i=1 x I i and t 2 = i=1 x (I+i), permits person-specific change (Meiser, 1996; see also Duncan, 1985a, 1985b). In Equation (17), the threshold parameters are specified to be invariantover time. This invariance mirrors the assumptions that latent change is confined to the individuals' total scores, which are the sufficient statistics of the latent person parameters vd, and that the structure of the items and response categories remains unchanged. Note that the unidimensional Rasch model of global change is a submodel of the two-dimensional Rasch model of person-specific change: Equation (16) results from Equation (17) by the restrictions u (t1;t 2) = u t + t 2 with t = t 1 + t 2. Hence, the two models can be compared by means of the conditional log-likelihood ratio statistic G 2 in order to test for homogeneity ofchange. In Equation (17), each of the item sets 1; :::; I and I +1; :::; 2I is specified to be unidimensional. Therefore, multidimensionality comes into play only by admitting person-specific change between the two measurement occasions which both refer to a unidimensional Rasch model. This contrasts with the LLRA which abandons the assumption of unidimensionality per measurement occasions as discussed above. 3 Mixture Distribution Rasch Models The essential assumption of the Rasch model that the threshold parameters are homogeneous across individuals can be relaxed by use of finite mixture distribution models (Everitt & Hand, 1981; Rost & Erdfelder, 1996; Titterington, Smith, & Makov, 1985). In finite mixture distribution models, the probability of response vector X is characterized by a set of component probability functions p(x j c) which are conditional on latent subpopulations c, c =1; :::; C, and by the distribution of the latent subpopulations: p(x) = CX c=1 ß c p(x j c); (18) where ß c denotes the probability of subpopulation c. If the component distributions of a mixture multinomial distribution are specified in terms of the unidimensional Rasch model (see Equation (4)), the mixed

9 Meiser et. al.: Latent Change in Discrete Data 83 Rasch model results: p(x v =(x 1 ; :::; x I )) = CX Q exp I P mi i=1 ß c 0 t vjc P I i=1 P xi s=1 fi isjc y=0 exp y vjc P y s=1 fi isjc 1 A (19) (Rost, 1990, 1991; von Davier & Rost, 1995). In the mixed Rasch model, homogeneity of threshold parameters is specified within each of the latent subpopulations, while the parameters may differ between latent subpopulations. Parameter estimation for mixture distribution Rasch models via the EM algorithm is described by Rost (1990, 1991) and von Davier and Rost (1995). 3.1 A Mover-Stayer Mixed Rasch Model for Repeated Observations Recently, mixed Rasch models were applied to longitudinal data in order to separate different patterns of change in an exploratory manner (Glück & Spiel, 1997; Meiser, Hein-Eggers, Rompe, & Rudinger, 1995; Meiser & Rudinger, 1997). Here we want to focus on a special case of mixed Rasch models for a more confirmatory analysis of longitudinal data, namely on a mover-stayer model encompassing a subpopulation c = 1 of global change and another subpopulation c = 2 in which nochange occurs. In the case of two measurement occasions with items i =1; :::; I observed at T 1 and virtually new items i = I +1; :::; 2I observed at T 2, the model of global change in subpopulation c = 1 and of no change in subpopulation c = 2 is specified by the restrictions fi (I+i)sj1 = fi isj1 and fi (I+i)sj2 = fi isj2 (20) on the threshold parameters of the component distributions in Model (19). By the additional restriction fi isj1 = fi isj2 = fi is (21) identical initial threshold parameters are specified for both subpopulations so that differences between the latent subpopulations are confined to potential differences in the distribution of the person parameter and to the a priori specified difference in the pattern of change, that is global change in subpopulation c =1versus no change in subpopulation c = 2. Formally, the mover-stayer mixed Rasch model with restriction (21) is related to the Saltus model (Wilson, 1989) for the analysis of discontinuous development by cross-sectional data. The generalization of the mover-stayer mixed Rasch model to more than two measurement occasions is straightforward. 4 Development of Understanding and Solving Arithmetic Word Problems in Elementary School Children The data of the present analysis were gathered in the longitudinal study SCHOLASTIK by the Max-Planck-Institute of Psychological Research in Munich, Germany (Weinert & Helmke, 1997). In this study a sample of 1,453 children from 54 elementary school classes were repeatedly presented with achievement measures on mathematics, science, and the mother tongue from first grade to fourth grade. In this paper we only focus on a small part of the collected data: certain arithmetic word problems presented in the second and third grade are discussed.

10 Meiser et. al.: Latent Change in Discrete Data 84 Solving particular arithmetic word problems can be considered as a good indicator of mathematical understanding. While children can easily solve simple problems that describe the exchange of sets (e.g., At the beginning, John had 5 marbles. Then he gave 2 of these marbles to Peter. How many marbles does John have now?"), word problems dealing with the quantitative comparison (e.g., John has 5 marbles. He has 3 marbles more than Peter has. How many marbles does Peter have?") or dealing with certain kinds of combination (e.g., Peter and John have 8 marbles altogether. Peter has 3 marbles. How many marbles does John have?") provide particular difficulties. Solving comparison and combination problems requires an advanced mathematical understanding which is based on abstract part-whole relations rather than on the counting function of numbers (Stern, 1993, 1998; Stern & Lehrndorfer, 1992). Handling part-whole relations allows flexibility in formulating equations and partitioning sets. Particularly complex word problems which require the inference of information can only be solved on the basis of part-whole representations. In this article we concentrate on three complex word problems dealing with the comparison and the combination of sets presented in the second and third grade of elementary school. 4.1 Items and Sample The arithmetic word problems selected for the present analysis were: 1. Jack and Beth have 6 apples altogether. Jack has 2 apples. Ken and Ina have 9 apples altogether. Ken has 5 apples. How many apples do Beth and Ina have altogether? 2. John has 7 rabbits. He has 4 rabbits more than Tom. How many rabbits do John and Tom have altogether? 3. Joyce has 7 marbles. She has 2 marbles more than Tom has. Oliver has 3 marbles more than Tom. How many marbles does Oliver has? Responses were scored 0 for incorrect responses and 1 for correct responses. A sample of N = 1030 children participated in the two measurement occasions considered here, that is in second and third grade. The empirical frequencies of the response vectors comprising the responses to the three arithmetic word problems at second and third grade are displayed in Table 1. Since the kind of word problems used for the present analysis is only rarely presented during elementary school, children cannot simply retrieve complete solution strategies from memory; they have to develop them on their own. Therefore, solving each of the three word problems requires what is called far-transfer: a deep reconstruction of the existing knowledge base in arithmetic is necessary. The mathematical requirements of the three problems go far beyond what is usually dealt with in the first two years of elementary school mathematics. Therefore, only second graders who have already developed outstanding mathematical competencies on their own can be expected to solve at least some of the problems at this age level. In the third grade, part-whole understanding of arithmetic equations is particularly emphasized in school, for instance by frequently presenting children with fill-in-theblank problems. Children who benefitted from the instruction at school should be able to develop solution strategies for the three word problems and thereby improve

11 Meiser et. al.: Latent Change in Discrete Data 85 Table 1: Observed and Expected Frequencies of Response Vectors for the Three Arithmetic Word Items at Second and Third Grade Grade 2 Grade 3 Frequencies Grade 2 Grade 3 Frequencies It.1 It.2 It.3 It.1 It.2 It.3 Obs. Exp. a It.1 It.2 It.3 It.1 It.2 It.3 Obs. Exp. a a Expected frequencies according to the mover-stayer mixed Rasch model.

12 Meiser et. al.: Latent Change in Discrete Data 86 performance. There may be, however, children who do not benefit much from the learning opportunities in the third grade and therefore will not be able to construct advanced solution strategies for the word problems. These children will remain on their initial performance level. In their previous analysis of the data 1 Langeheine et al. (1994), using twoclass state mastery models, found that a mover-stayer model including two latent subpopulations, one with transitions from a state of low competency to a state of high competency and one without transitions between states of competency, was superior compared to a set of other models which did not allow for heterogeneity of change. Although this result fits nicely into the theoretical expectation of differential gains in mathematical reasoning performance during third grade, none of the statemastery models considered in this earlier analysis showed a satisfactory goodness of fit at conventional levels of statistical significance (Langeheine et al., 1994, p. 285). The poor fit may bedue to the rather restrictive assumption that there are only two levels of competency, which are reflected by the class of nonmasters" and the class of masters". In the present analysis, we drop this restrictive assumption by using Rasch models, rather than two-class state-mastery models, thereby allowing for more than two values of the latent person variable. 4.2 Results Conditional maximum-likelihood estimates for the parameters of loglinear Rasch models were obtained using the program LEM (Vermunt, 1997) which enables specifying the design matrices of nonstandard loglinear models. The criterion of statistical significance for model rejection was set at ff = :05. A power analysis using the program GPOWER (Faul & Erdfelder, 1992; see also Erdfelder, Faul, & Buchner, 1996) with specifications ff = :05, N = 1030, and medium effect size w = :3 (Cohen, 1988) revealed that the statistical power of detecting effects of at least medium effect size is larger than 1 fi = :99 for all of the goodness of fit tests and model comparisons reported in the following. Therefore, models and model restrictions which do not yield a significant misfit can be accepted with a small second-type error probability. The most restrictive models of change discussed above are the unidimensional Rasch model of global change specified in Equation (16) and its submodel of perfect stability resulting from the parameter fixation = 0.Note that both of these models are special cases of the LLTM as described earlier. The estimates of the threshold parameters 2 fi i and of the change parameter in the model of global change are listed in Table 2 as is the log-likelihood ratio test statistic G 2. As can be seen in the table, the model of global change yields a significant G 2 statistic so that it has to be rejected. Accordingly, the even more restrictive model of perfect stability must also be rejected for the present data, G 2 = 184:87, 55 df, p<:05. Since the unidimensional Rasch model of global change does not fit the data, we turn to the two-dimensional Rasch model permitting person-specific change. The estimates of the threshold parameters and the log-likelihood ratio statistic of the two-dimensional model specified in Equation (17) are displayed in Table 2. In contrast to the unidimensional Rasch model of global change, the two-dimensional 1 Although Langeheine et al. (1994, pp. 284 ff., Model X) selected the same items, the data set used in the previous analysis differs from the data of the present analysis: Langeheine et al. considered responses from N = 965 children who participated in three adjacent measurement occasions of the longitudinal study, whereas the present analysis is based on a sample of N = 1030 children with complete data for two measurement occasions. 2 Since the responses to the arithmetic word problems were scored in dichotomous categories 0 and 1 and, as a consequence, the items have only one threshold, the index indicating the threshold is omitted.

13 Meiser et. al.: Latent Change in Discrete Data 87 Rasch model shows an acceptable goodness of fit. As shown above, the unidimensional Rasch Model (16) is a submodel of the two-dimensional Model (17) so that the two models can be compared statistically, thereby testing for homogeneity of change across individuals. The conditional test statistic of the model comparison is significant, G 2 =17:69, 8 df, p<:05, indicating that the assumption of homogeneity ofchange does not hold for the present data. Considering the result that individuals differ with respect to their course of development from second to third grade, it is interesting to see whether the observed heterogeneity of change can be traced back to two simple processes occuring in different latent subpopulations: global change in subpopulation c = 1 and perfect stability in subpopulation c = 2. This hypothesis is reflected by the mover-stayer mixed Rasch model, that is by the mixed Rasch model specified in Equation (19) with the additional restrictions displayed in Equations (20) and (21). Since the program LEM allows the specification of user-defined design matrices for log-linear models in finite mixture distributions, the parameters of the mover-stayer mixed Rasch model can be estimated by parameterizing the component distributions in terms of the unidimensional Rasch model of global change (Equation (16)), where the threshold parameters are constrained to be equal across subpopulations (Equation (21)) and where is restricted to zero for latent subpopulation c = 2 (Equation (20)). Note that the response vectors (0,0,0,0,0,0) and (1,1,1,1,1,1) do not contribute to the CML estimation of the threshold parameters and therefore cannot be assigned to latent subpopulations c = 1 and c = 2 (Rost, 1991). Therefore, the loglinear parameters u 0jc and u 6jc are not identifiable, unless additional restrictions are imposed, such as equality constraints on u 0jc and u 6jc across c or fixations at zero in one of the two subpopulations. In the present analysis, we fitted the cells (0,0,0,0,0,0) and (1,1,1,1,1,1) of the contingency table by estimating the parameters u 0j2 and u 6j2 in the subpopulation of stayers and at the same time restricting u 0j1 and u 6j1 to zero for the subpopulation of movers. These restrictions affect the distribution of subpopulations, that is ß 1 and ß 2, but do not influence the estimates of the threshold and change parameters of the component Rasch models. As shown in Table 2, the mover-stayer mixed Rasch model does not yield a significant misfit. Therefore we can maintain the hypothesis that the existence of a mover and a stayer subpopulation accounts for the heterogeneity ofchange revealed in the previous steps of analysis. The estimated size of the mover subpopulation amounts to ^ß 1 =0:43 and that of the stayer subpopulation to ^ß 2 =0:57. The mean expected probability of a correct response is depicted in Figure 1 as a function of item, measurement occasion, and latent subpopulation. The expected frequencies of response vectors are displayed in Table 1. A relaxation of the model by permitting different fi-parameters for the latent subpopulations, that is by dropping the restriction displayed in Equation (21), does not result in a significant improvement in goodness of fit, G 2 =2:46, 2 df, p>:05. Since a statistical comparison of the two-dimensional Rasch model and the finite-mixture model with two subpopulations is not possible, we cannot directly test whether the more parsimonious mover-stayer model differs from the model of person-specific change. Therefore, the final model selection is based on the information criterion CAIC which emphasizes the avoidance of overfitting a model (cf. Bozdogan, 1987). The mover-stayer mixed Rasch model shows a CAIC value of 7440:27 which is slightly lower than that of the two-dimensional Rasch model with a CAIC value of 7442:36. Therefore, the more parsimonious mover-stayer model may be preferred over the model of person-specific change.

14 Probability of a Correct Response Meiser et. al.: Latent Change in Discrete Data 88 Table 2: Parameter Estimates of the Threshold and Change Parameters and Goodness of Fit for the Unidimensional Rasch Model of Global Change, the Two-Dimensional Rasch Model of Person-Specific Change, and the Mover-Stayer Mixed Rasch Model Model of change Global Person-specific Mover-stayer fi 1 0:244 0:250 0:249 fi 2 0:310 0:317 0:317 fi 3 0:066 0:067 0:067 0:669 1:188 G 2 72:53 Λ 54:84 64:79 df Λ p<:05 Figure 1: Mean expected probabilities of correct responses to the arithmetic word items at second and third grade for the latent subpopulations of movers and stayers Movers Stayers Item 1 Item 2 Item 3 Item 1 Item 2 Item 3 Grade 2 Grade 3

15 Meiser et. al.: Latent Change in Discrete Data 89 5 Conclusions The Rasch model and its extensions form a class of psychometric models which can be used for modeling various aspects of change, such as global change, personspecific change, as well as different patterns of change in different subpopulations. The loglinear representation of unidimensional and multidimensional Rasch models and the loglinear parameterization of component distributions in mixture distribution Rasch models allow specifying and testing hypotheses about change in a flexible and straightforward way. In the present article, we pointed out that the unidimensional Rasch model of global change is a submodel of the multidimensional Rasch model of person-specific change and that a mover-stayer mixed Rasch model can be specified by appropriate restrictions on the threshold parameters of the mixed Rasch model. Thereby, we addressed two issues raised by Stelzl (1997) in her reply to Glück and Spiel (1997): first, Stelzl questioned the assumption of homogeneity of change underlying the LLTM and other item response models for measuring change in repeated measures designs. The above-mentioned hierarchical relation of the loglinear unidimensional Rasch model of global change and the loglinear multidimensional Rasch model of person-specific change allows explicitly testing this assumption by means of a conditional likelihood ratio test. Second, Stelzl emphasized the need to specify treatment or change parameters in mixed Rasch models, if these models are to be used for analyzing change as suggested by Glück and Spiel (1997). In the context of the mover-stayer mixed Rasch model, we showed how to specify a change parameter for one subpopulation and to fix the parameter at zero for another subpopulation. If a research design comprises several experimental or quasi-experimental groups, straightforward extensions of the mover-stayer model by the extraneous group variable are available. Those extended mover-stayer models may include, for instance, equality constraints on the threshold and change parameters across groups while investigating differences in the sizes of the mover and stayer subpopulations, or they may impose equality constraints on the threshold parameters across groups while testing for differences in the change parameters of movers. Thus, fitting a moverstayer mixed Rasch model may be the basis for further analyses to investigate which groups of individuals will be affected by certain interventions and which groups of individuals will not be affected or will be affected in a different way. As illustrated by the above analysis of mathematical reasoning performance in elementary school children, the restrictive assumption of homogeneity of change may well be inappropriate for a given data set. Furthermore, we agree with Stelzl (1997) that the assumption may even be implausible as far as natural (i.e., not experimentally induced) change is concerned: in the context of solving arithmetic word problems, children are expected to differ in both their level of competency at one measurement occasion as well as their growth in competency from one measurement occasion to another. The unidimensional Rasch model of global change reflects differences in initial competency, but precludes interindividual differences in growth. In contrast, the multidimensional Rasch model of person-specific change is a rather unrestrictive model, inasmuch as it incorporates differences in reasoning competency at a given occasion as well as unconstrained interindividual differences in the amount of change from one occasion to another. The mover-stayer mixed Rasch model falls in between these two extremes by allowing for interindividual differences in change in terms of two well-defined developmental processes: global change for one subpopulation and stability for another subpopulation. For the present data, the mover-stayer model turned out to be superior to the alternative Rasch models considered, because it is not rejected and provides a parsimonious description of the data generating processes. The two subpopulations separated by the mover-stayer mixed Rasch model differ

16 Meiser et. al.: Latent Change in Discrete Data 90 with respect to the impact school instruction has on the development of mathematical competencies. In the third grade, some children gained from practicing problems that require part-whole modeling by developing solution strategies for complex word problems, while others did not. The subpopulation of stayers is composed of children who did not profit from elementary school instruction either because they had already developed the competencies by themselves before they were taught at school, or because the cognitive or motivational preconditions for profiting from the instruction were not available. In the subpopulation of movers, the children extended their arithmetical competencies by exploiting what had been taught at school, thereby improving their performance in word problem solving. References [1] Andersen, E. B. (1995). Polytomous Rasch models and their estimation. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp ). New York: Springer. [2] Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, [3] Andrich, D. (1982). An extension of the Rasch model for ratings providing both location and dispersion parameters. Psychometrika, 47, [4] Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. B. Tuma (Ed.), Sociological Methodology 1985 (pp ). San Francisco: Jossey-Bass. [5] Andrich, D., de Jong, J. H. A. L. & Sheridan, B. E. (1997). Diagnostic opportunities with the Rasch model for ordered response categories. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp ). Münster: Waxmann. [6] Bozdogan, H. (1987). Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, [7] Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. [8] Cressie, N. & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, [9] Duncan, O. D. (1985a). New light on the 16-fold table. American Journal of Sociology, 91, [10] Duncan, O. D. (1985b). Some models of response uncertainty for panel analysis. Social Science Research, 14, [11] Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28, [12] Everitt, B. S. & Hand, D. J. (1981). Finite mixture distributions. London: Chapman and Hall. [13] Faul, F. & Erdfelder, E. (1992). GPOWER: A priori, post-hoc, and compromise power analyses for MS-DOS. University of Bonn: Department of Psychology. [14] Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48, [15] Fischer, G. H. (1995). Linear logistic models for change. In G. H. Fischer& I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp ). New York: Springer. [16] Fischer, G. H.& Formann, A. K. (1982). Some applications of logistic latent trait models with linear constraints on the parameters. Applied Psychological Measurement, 6,

17 Meiser et. al.: Latent Change in Discrete Data 91 [17] Fischer, G. H. & Molenaar, I. W. (Eds.) (1995). Rasch models. Foundations, recent developments, and applications. New York: Springer. [18] Fischer, G. H. & Parzer, P. (1991). An extension of the rating scale model with an application to the measurement of change. Psychometrika, 4, [19] Fischer, G. H. & Ponocny, I. (1994). An extension of the partial credit model with an application to the measurement of change. Psychometrika, 59, [20] Fischer, G. H. & Ponocny, I. (1995). Extended rating scale and partial credit models for assessing change. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp ). New York: Springer. [21] Glück, J. & Spiel, C. (1997). Item Response-Modelle für Meßwiederholungsdesigns: Anwendung und Grenzen verschiedener Ansätze. [Item response models for repeated measures designs: Application and limitations of different approaches]. Methods of Psychological Research Online, 2. Internet: [22] Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, [23] Kelderman, H. (1993). Estimating and testing a multidimensional Rasch model for partial credit scoring of children's application of size concepts. In R. Steyer, K. F. Wender & K. F. Widaman (Eds.), Psychometric methodology. Proceedings of the 7th European Meeting of the Psychometric Society in Trier (pp ). Stuttgart: Fischer. [24] Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, [25] Kelderman, H. (1997). Loglinear multidimensional item response models for polytomously scored items. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp ). New York: Springer. [26] Kelderman, H. & Rijkes, C. P. M. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, [27] Langeheine, R. (1983). Nonstandard log-lineare Modelle [Nonstandard log-linear models]. Zeitschrift für Sozialpsychologie, 14, [28] Langeheine, R. (1993). Diagnosing incremental learning: Some probabilistic models for measuring change and testing hypotheses about growth. Studies in Educational Evaluation, 19, [29] Langeheine, R. & Rost, J. (1988). Latent trait and latent class models. New York: Plenum Press. [30] Langeheine, R., Stern, E., & van de Pol, F. (1994). State mastery learning. Dynamic models for longitudinal data. Applied Psychological Measurement, 18, [31] Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, [32] Masters, G. N. & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp ). New York: Springer. [33] Meiser, T. (1996). Loglinear Rasch models for the analysis of stability and change. Psychometrika, 61, [34] Meiser, T., Hein-Eggers, M., Rompe, P. & Rudinger, G. (1995). Analyzing homogeneity and heterogeneity of change using Rasch and latent class models: A comparative and integrative approach. Applied Psychological Measurement, 19, [35] Meiser, T. & Rudinger, G. (1997). Modeling stability and regularity of change: Latent structure analysis of longitudinal discrete data. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp ). Münster: Waxmann. [36] Rasch, G. (1968). An individualistic approach to item analysis. In P. F. Lazarsfeld & N. W. Henry (Eds.), Readings in mathematical social science. Cambridge: MIT Press.

Monte Carlo Simulations for Rasch Model Tests

Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,