EVALUATION OF MATHEMATICAL MODELS FOR ORDERED POLYCHOTOMOUS RESPONSES. Fumiko Samejima*

Size: px
Start display at page:

Download "EVALUATION OF MATHEMATICAL MODELS FOR ORDERED POLYCHOTOMOUS RESPONSES. Fumiko Samejima*"

Transcription

1 EVALUATION OF MATHEMATICAL MODELS FOR ORDERED POLYCHOTOMOUS RESPONSES Fumiko Samejima* In this paper, mathematical modeling is treated as distinct from curve fitting. Considerations of psychological reality behind our data are emphasized, and criteria such as additivity in a model, its natural generalization to a continuous response mode], satisfaction of the unique maximum condition and orderliness of the modal points of the operating characteristics of the ordered polychotomous responses are proposed. Strengths and weaknesses of mathematical models for ordered polychotomous responses that include the normal ogive model, the logistic model, the acceleration model and the family of ordered polychotomous models developed from Bock's nominal model are observed and discussed in terms of such criteria. It was concluded that it will be better to leave Bock's model as a nominal model as he intended it to be, without expanding it to ordered polychotomous models. 1. Introduction It is a widely used approach that, using a certain statistical method, or methods, out of several different mathematical models, a researcher decides which model fits best to the set of data in question, and accepts the best fitted model. Although it has been used as a standard procedure, this mechanical application of a statistical method involves certain serious problems which may lead to wrong decisions. There is a high probability that a model which provides varieties of different shapes of curves be selected regardless of the principle and assumptions behind the model, that is, they may not agree with the psychological background of our data ; then the procedure leads to a simple curve fitting, which is distinct from mathematical modeling. Thus it is important to identify a family of models that can be substan tively justified, before using such a statistical method. Samejima (1972) proposed a general latent trait model for graded, or ordered polychotomous, responses, and in this framework distinguished the homogeneous case and the heterogeneous case. In the same paper, she also pointed out that a family of ordered polychotomous response models can be developed from Bock's nominal model (Bock, 1972) which belongs to the heterogeneous case, with a restriction that one of the two parameters in the model be arranged in an ascending order of the polychotomous item score. Samejima (1979) did not pursue graded response models that could have been expanded from Bock's nominal model, Key Words and Phrases ; latent trait models, item response theory, ordered polychotomous responses, categorical data, mathematical modeling, graded response models, partial credit models, nominal models, continuous response models Department of Psychology, University of Tennessee, Knoxville, Tenn , same psychl.psych.utk.edu

2 however, because of the fact that Bock's nominal model is based on choice behavior and that the assumption intrinsic in the model does not fit typical ordered poly chotomous response situations. Later, however, Masters (1982) and Muraki (1992) proposed the partial credit model and the generalized partial credit model, respec tively, for ordered polychotomous responses, which are special cases of Bock's nominal model satisfying Samejima's condition. For brevity, those models for graded responses extended from Bock's nominal model will be called extended Bock models. Thissen & Steinberg (1986) pointed out that many seemingly disparate models proposed for multiple responses may be considered as generalizations or special cases of each other, and Samejima's (1969) graded and Bock's (1972) nominal proposals remain the only distinct approaches. In that paper, what Thissen & Steinberg meant by Samejima's graded response model is actually a subset of models in the homogeneous case represented by the normal ogive and logistic models (Samejima, 1972). Thissen & Steinberg's naming, difference models, as opposed to divided-by-total models represented by Bock's nominal model, is still applicable, however, for Samejima's general model of ordered polychotomous responses which includes the heterogeneous case as well as the homogeneous case. Samejima (1995) proposed the acceleration model which belongs to the hetero geneous case. In the present paper, strengths and weaknesses of mathematical models for ordered polychotomous responses that include the normal ogive model, the logistic model, the acceleration model, and the family of ordered polychotomous models developed from Bock's nominal model will be discussed in terms of criteria other than the goodness of fit of the curves, and some conclusions will be reached. 2. Curve fittings and mathematical modeling In this paper, the term curve fitting is used for an application of a statistical method, or methods, for evaluating discrepancies between the empirically obtained curves and the ones provided by a parametric model. The reasons why curve fittings should not be over-emphasized in model selection include : (a) Goodness of fit of the curves to the set of data is a necessary condition for justifying the use of a specific model, but not a sufficient condition. (b) Two or more mathematical models based on substantially different principles, and thus with parameters of substantially different mean ings, may provide almost identical sets of curves regardless of their differences in philosophy. (c) Poor fit of curves can be realized not because of the inappropriateness of the model, but because of deficiencies or limitations in the adopted computer software. To illustrate (b), Figure 1 presents six operating characteristics following the

3 acceleration model, and Figure 2 also presents six operating characteristics follow ing Masters' partial credit model, which belongs to the family of extended Bock models to deal with ordered polychotomous responses, and is based on substantially different principles from those behind the acceleration model. It is obvious that the two sets of operating characteristics shown in Figures 1 and 2 are practically identical, regardless of the differences in philosophies behind the two models. This implies that, if one set of curves fits some data set well, then the other set of curves will fit just as well, provided that software for the second model is written as well as that for the first model, the fact that materializes the statements (b) and (a). For the reasons described above, differences in goodness of fit provided by Fig. 1 A set of operating characteristics of six steps in the acceleration model, with axe= , , , , , (3X8= , , , , and axe= , , , , for x,= 1, 2, 3, 4, 5, respectively. Fig. 2 A set of operating characteristics of six discrete responses following Bock's nominal model converted to a graded response model, or Masters' partial credit model, with ax,, =1, 2, 3, 4, 5, 6 and /-1xe=1.0, 2.0, 3.0, 3.5, 1.8, 1.0 for xg=0, 1, 2, 3, 4, 5, respectively.

4 separate mathematical models should not be over-emphasized in model selection, or, otherwise, it is very possible that we falsely select a model which does not represent the nature of our data. Continuation of the use of such a model in further research will eventually be confronted by problems. It is inconceivable that a single mathematical model be appropriate for all sets of data with varieties of different psychological backgrounds ; and yet a wrong model selection procedure may lead to such a conclusion as long as the goodness of fit of the curves is used as a sole criterion for model selection. In mathematical modeling, therefore, the fit of the principles behind a model, rather than the curves it provides, to the psychological reality on which our data are based is most important. If the fit is very poor for a model that seems to be appropriate in principle, then there will be room for reconsideration. If the fit of the curves provided by a model to our data is reasonably good, or fair, however, the model should not be discarded ; criteria other than goodness of fit should be seriously considered in order to make a right decision in model selection. 3. Principles behind the models Let 0 be the latent trait, or ability, which represents a construct hypothesized behind certain human behavior, and is assumed to take on any real number. Let g denote an item, which is the smallest manifest entity for measuring 0. Let Xg be a graded item response to item g, and xg(=o, 1,.., mg) denote its realization. Note that these non-negative integers are arbitrary, and will never be used directly in ability estimation, unless sequences of item scores, or response patterns, are sum marized into the test score and it is used as a substitute for ability with a loss in the amount of test information (Samejima, 1969) and thus in accuracy of ability estima tion. If the reader prefers, therefore, he/she can use, say, A, B, C, etc., instead of 0, 1, 2, etc. The operating characteristic, P,,(0), of the item score xg indicates the conditional probability, given 0, with which the individual of ability 0 obtains the item score xg, that is, Pxg(B)=prob.[Xg=xg 10]. This operating characteristic is assumed to be five times differentiable (Samejima, 1993a, 1993b) with respect to 0. For convenience, hereafter, xg will be used both for a specific discrete response and for the event Xg=xg, and a similar usage is applied for other symbols. The fundamental framework of the general graded response model (Samejima, 1972) is given by Pxg(e)= It Mu(e)[1-M(xg+1)(0)], (3.1) usxg where M,,(8), called processing function (Samej ima, 1995) of the step xg (=1, 2,, mg),

5 which is the joint conditional probability with which the individual clears the step xg, under the conditions that : (a) the individual's ability level is 0, and (b) the steps up to (x,,-l) have already been cleared. The processing function is assumed to be non-decreasing in 0. Let (mg+1) be the hypothesized graded item score adjacent to and above mg. Since, regardless of 0, everyone can at least obtain the item score 0, and no one is able to obtain the item score (mg+1), it is reasonable to set =1 for xg=0 Mxg(e) =0 for x g=mg+1, for all 0. Let Pg(0) denote the cumulative operating characteristic (Samejima, 1995), which is the conditional probability with which the individual of ability 0 clears at least the step xg. Thus P g(8)=prob.[xgzxg[0]=ii usxg Mu(0). (3.2) From (3.1) and (3.2) the operating characteristic Pxg(e) can be written as Pxg(B)=P g(b)-i (xg+,)(e). (3.3) Thissen & Steinberg's naming (Thissen & Steinberg, 1986), difference models, comes from the above Eq. (3.3). It should be noted that the general framework represented by (3.1) is not restricted to sequential processes. Take a Lickert type categorical judgment, for example. When we select one of the four response categories, strongly disagree, disagree, agree and strongly agree, to a given statement in social attitude measure ment, usually one does not compare his/her beliefs with each of the consecutive categories starting from the bottom. And yet selection of a specific response category implies such comparisons, and Mxg(e)'s for those xg's implicitly exist. It should also be noted that in sequential processes surpassing the step xg may not be explicit for all individuals. This is exemplified by the fact that some bright individ uals seemingly skip lower steps in the sequence and go directly to higher steps. The general model represented by (3.1), (3.2) and (3.3) leads to two separate cases, that is, the homogeneous case and the heterogeneous case. Models in the homogeneous case are characterized by the identical shapes of the cumulative operating characteristics, Pg(e)'s, for xg=1, 2,..., m,; these mg functions are positioned alongside the abscissa in accordance with the order of the item scores xg. Note that the distances between the two adjacent curves, Pg(e) and P(Xg+,)(0) for xg =1, 2,, xg -1, may be different for separate pairs. Thus for a model in the homogeneous case Pg(e) can be expressed as

6 e(b-bx e ) P g(b)f = G(u)du, (3.4) where cb(u) is a specified density function, ag(>o) is the discrimination of item g, which is common to all responses parameter for the item score xg satisfying parameter to item g, and bxg is a location -c=bo<b1<b2<.. <b mg<bmg+1=. (3.5) Note that from (3.2), (3.4) and (3.5) it is obvious that Mxg(O) of any model which belongs to the homogeneous case assumes unity for xg = 0 and zero for xg = mg + 1 for all 0. Two examples of this family are the normal ogive and logistic models (Same jima, 1969, 1972), whose cumulative operating characteristics are given by and g(b bxg) Pxg(0) (3.6) 2 f exp [ 2 j du P g(b) l+ exp [-Dag(e-bxg)] ' (3.7) respectively, where bxg's for xg=0, 1,, mg, mg +l satisfy (3.5), and the scaling factor D in (3.7) is usually set equal to 1.7 in order to make these cumulative operating characteristics close enough to those in the normal ogive model (see Birnbaum, 1968). Eq. (3.6) and (3.7) are special cases of (3.4) where O(U) is specified by the standard normal density function and the logistic density function that accomodates D, respectively. It should be noted, however, that the cumulative operating characteristics, P g(8)'s, in the homogeneous case do not have to be point-symmetric, that is, the relationship P g(bg+j0)=l-pg(bg-j0), where a8 is any increment or decrement of 0, does not have to hold, as it does in the normal ogive model and the logistic model. Some asymmetric examples have been shown elsewhere (see Samejima, 1972), and more general observations and discussion concerning asymmetric P g(9)'s are made in a separate paper (Samejima, in preparation). It has been observed (Samejima, 1972) that, in spite of the similarity between the two sets of mg cumulative operating characteristics in the normal ogive and logistic models, philosophies behind their processing functions are characteristically different. Figures 3 and 4 illustrate processing functions as well as cumulative operating characteristics in the normal ogive and logistic models, respectively, with mg=5 and the common item parameters ag=1.0, b1=-3.5, b2=-3.0, b3=-2.0, b4= 0.0 and b5=3.0. A characteristic difference between these two models lies in the lower asymptotes of their processing functions. In the normal ogive model, this

7 Fig. 3 A set of processing functions and corresponding cumulative operating characteristics for xg = 1, 2, 3, 4, 5, following the normal ogive model with the parameters ag = 1.0 and bxr= -3.5, -3.0, -2.0, 0.0, 3.0, respectively. Fig. 4 A set of processing functions and corresponding cumulative operating characteristics for xg=1, 2, 3, 4, 5, following the logistic model with the scaling factor D=1.7 and the parameters ag =1.0 and bxa= -3.5, 3.0, 2.0, 0.0, 3.0, respectively. asymptote equals zero for every x,(= 1, 2,.., mg), whereas in the logistic it is given by lim Mxg(O)=exp [-Dag(bxg-bxg_1)], which assumes zero for xg =1 and a positive value for xg = 2, 3,.., mg, and this value increases as the distance between the two adjacent difficulty parameters decreases. This difference will be discussed further in a later section. By the heterogeneous case of the graded response model we mean a family of models in which not all P g(o)'s for x,=1, 2,..., mg are identical in shape. Note that (3.3) applies to models in the heterogeneous case, as well as those in the homogeneous case. The acceleration model (Samejima, 1995) is an example of this family of models. In this model Pxg(O) is given by Pxg(O)=II [Pu(e)] u[1-[?(xg+1)(e)]exg+1], u I X, (3.8)

8 where Exg( > 0) is the step acceleration parameter, and V,,(0) in (3.8) may be specified by Pxg(e)= 1+ 1 exp [-Daxg(0-,3xg)]' (3.9) where D=1.7, and ax,( > 0) and Qxg are the discrimination and location parameters, respectively. When (3.9) is adopted for Vxg(e), the processing function becomes 1 sxg Mxg(e)= 1+ exp [-Daxg(e-axg)] J 1 (3.10) In the example of the acceleration model illustrated in Figure 1, (3.9) was used for Wxg(e) and the parameter values are : axg= , , , , ,3xg= , , , , ~xg= , , , , , for xg = 0, 1, 2, 3, 4 and 5, respectively. The acceleration model has been proposed, basically, as a model for sequential cognitive processes, such as those in problem solving. In this specific application, it is assumed that there are more than one observable step in the entire cognitive process. Graded item scores, or partial credits, 1 through mg, are assigned to the successful completions of these separate observable steps. It is also assumed that, within each step, there are sxg(> 1) subprocesses wxgl, Wxg2,, Wxgsxg, which may or may not be observable. These subprocesses within a step contribute to successful completion of the step through their own subprocess acceleration parameters $Wxgi > 0 (i=1, 2, --, sxg), through sxh ExgEwxgi. (3.11) It is obvious from (3.11) that the processing function Mxg(e) can be expressed as the product of the sxg subprocess processing functions within the step xg. Let M(wx gh) (0) be the incomplete step processing function after the h-th subprocess within the step xg has been cleared, where Thus 1 << h < sxg. 1+exp [-Daxg(B-~3xg)] ] M(wgh)(0)-_ 1 Eh ~~wxgi. (3.12) From (3.12), the first and second partial derivatives of M(wx gh)(0) with respect to 0 can be written as and a9 M(wxgh)(0)= Z=~ ~wxg:daxg[ xg(e)]e"-,iwxgt[1?xg(0)] >0

9 aea2 2 M(wxgh)(0)= Z-~ h swxgid2axb[ xg(e)]e%,;wxe, h ~1 Wxg(e)][h Ewxgt{1 Wxg(e)} 'xg(e)], (3.13) respectively, where?p'xg(e) is given by (3.9). Setting (3.13) equal to zero, Ohdmax, the value of 0 at which M(wxgh>(B) is most discriminating, is obtained by ~1 h Bhdmax W41 1+'h C ~z=1 cw Xe: z=1 wxgi (3.14) It is obvious from (3.9) and (3.14) that Bhdmax increases as a greater number of subprocesses within the step have been cleared. Figure 5 illustrates the change of Bhdmax for hypothetical 5 subprocesses within a step, for which axg=1.0 and 8x,=0.0 in (3.9) and ~wxe,'s are 0.25, 0.25, 0.25, 0.25 and 0.50 for i=1, 2, 3, 4, 5, respectively. The values of Bhdmax for h=1, 2, 3, 4, 5 were obtained from (3.14) and turned out to be , , 0.169, and 0.239, respectively, and the corresponding values of M(wxeh)(e) at 0= Bhdmax are 0.669, 0.577, 0.530, and Thus if $wxb, assumes a large value, then the contribution of wxg= to the success ful completion of the step xg will be large in the sense that it accelerates the sum of the subprocess parameters to $xg and the value of Bhdmax ; if it is small, then its contribution will also be small. Note that $wxgi can be zero, without contributing in accelerating the sum of the subprocess parameters or Bhdmax. To give an exam ple, in proving the cosine law, one step is to use Pythagoras' Theorem. In this step, a subprocess to draw a perpendicular line to a side from the opposite angle is included. It is considered that anyone who thinks of using Pythagoras' Theorem can draw such a line. This implies that Ewxg; for this subprocess i is zero, that is, no contribution of the subprocess wxgi to the step acceleration parameter ~xg is Fig. 5 Change of Bhdmax for hypothetical five subprocesses within a step, with the step discrimi nation and location parameters a,,= 1.0 and /3x8=0.0 and the subprocess acceleration parameters ~a xa =0.25, 0.25, 0.25, 0.25, 0.50 for i =1, 2, 3, 4, 5, respectively.

10 provided. If two or more steps, or sequences of steps, are reversible in order, the steps are said to be parallel, as distinct from serial steps. The same logic applies for subprocesses within a step, that is, parallel subprocesses are those which are reversible in order and serial subprocesses are those which are irreversible in order within the step. It is assumed that for any number of parallel subprocesses the subprocess acceleration parameters are unchanged by reversals of their sequential orders, so that the step acceleration parameter given by (3.11) be unaffected. In Bock's nominal model (Bock, 1972) the operating characteristic is given by Pkg(e)= ~ exp [akgo+/3kg] L.~uEKg exp [aue+,3u]' (3.15) where kg denotes a nominal response to item g, Kg is a specific subset of responses selected from the total answer space, as exemplified by the set of the correct answer and several distractors in the multiple-choice test item, and akg(> 0) and /3k g are item response parameters. It is obvious from (3.15) that the operating characteris tic Pkg(O) in Bock's nominal model depends on the specific subset of responses to item g, for the denominator of (3.15) is the sum total of the numerators of the operating characteristics of all kg E Kg, whereas the numerator stays unchanged regardless of the choice of the subset. Thissen and Steinberg (1986) called this family divided-by-total models, the naming stemming from (3.15). It is obvious from (3.15) that invariance exists in the conditional ratio of the operating characteristics, given 0, of any pair of responses kg and hg, regardless of the choice of the subset Kg from the answer space, to which kg and hg belong. conditional ratio is given by This Pkg(e) Phg(8) exp [akge+/3kg] exp [ ahg 0 +,3hg] exp [(akg-ahg)e] exp [,8k,,-Nkg], (3.16) which solely depends on the parameters of the two responses kg and hg, the principle similar to the one behind individual choice behavior (Luce, 1959). It has been shown (Samejima, 1972) that the model can also be considered graded response model in the heterogeneous the parameter ax, satisfies as a case if kg is replaced by xg in (3.15) and ao<-al<-as<...<amg, (3.17) where a strict inequality should hold at least at one place. If this condition is satisfied, then the processing function is given by Mxg(B) 1Mg xgg P[La e+~a] u] (3.18) Samejima (1979) found it difficult to extend Bock's nominal model to an ordered polychotomous model, however. A big difference between Samejima's general graded response model represented by (3.1) and Bock's nominal model represented

11 by (3.15), or between difference models and divided-by-total models, is that the borderlines or thresholds of adjacent item response categories are parameterized in the former whereas item responses themselves are parameterized in the latter. The implicit invariance assumption represented by (3.16) is acceptable only when kg and hg are solid discrete entities, that cannot be more finely classified nor combined with another response or responses, although these characteristics are required in many typical ordered polychotomous response situations. Later, however, Masters (1982) proposed his partial credit model and Muraki (1992) proposed his generalized partial credit model, both of which are special cases of Bock's nominal model that are converted to ordered polychotomous models satisfying (3.17) with strict inequal ities at all places. In Masters' partial credit model, axg is given by axg=xg+l for x,=0, 1,, mg. (3.19) In the example illustrated in Figure 2, the values of /3Xg are: 1.0, 2.0, 3.0, 3.5, 1.8, 1.0 for xg=0, 1, 2, 3, 4, 5, respectively. In Muraki's generalized partial credit model, the operating characteristic has an additional discrimination parameter a,(> O), and axg=(xg+l)ag for xg=0, 1,..., mg. 4. Typical ordered polychotomous responses Typical ordered polychotomous responses are identified in: (1) categorical judgment which was exemplified earlier, (2) rating scales exemplified by letter grading (e.g., A, B, C, D and F) of academic performance, (3) partial credit given in accordance with an individual's level of closeness to a specific goal, which is exemplified by a cognitive process like problem solving, etc. Each situation has somewhat different characteristics, and selection of a model, or models, in each case should be made with its specific psychological background in mind. It can be seen, however, that there are certain characteristics which are com mon among the above typical ordered polychotomous response situations. First of all : *Those ordered polychotomous categories are more or less arbitrary. To give a couple of examples, for a required college course the letter grades, A, B, C, D and F, may be changed to Pass and Fail, setting the borderline between, say, B and C ; also for a statement in social attitude measurement, a dichotomous response format, a 5-point scale format, a 7-point scale format, etc., and even a continuous response format are used. Another example can be seen in cognitive assessment. With the advancement of computer technologies, it is quite possible to obtain more abundant information from the individual's performance in computer ized experiments with constructed responses as we proceed in research, which will result in increment of the number of ordered categories.

12 The above examples indicate that there are two directions, that is, 1. finer recategorizations of responses, and 2. combinations of two or more adjacent categories. It is noted that arbitrariness of ordered polychotomous response categories includes two different situations, with respect to the thresholds between response categories. (a) Fixed threshold situation. The example of redichotomizing the letter grades, A, B, C, D and F into Pass and Fail belongs to this situation, and so does the case of reducing the number of 5-point response categories into 2 categories in data analysis. Also the example of more precise observation of a cognitive process cited earlier belongs to this situation. (b) Flexible threshold situation. An example can be seen when B+ is added to the set of letter grades, A, B, C, D and F and the protocols are regraded using the resulting 6 categories. It is likely, for example, that the threshold between B and C will be shifted to the negative direction. Similar shifts will occur in attitude measurement if the set of 5 response categories, strongly disagree, disagree, neutral, agree and strongly agree, is changed to that of the 4 response categories by deleting neutral, and data are collected again. Additivity intrinsic in a model is defined as the characteristic of the model which provides the operating characteristics that belong to the same model, that is, the mathematical form of the resulting operating characteristic(s) is the same as that of the original operating characteristic(s) for both more finely categorized responses and combined category responses, in either of the two threshold situa tions. If additivity does not hold for a model, then the operating characteristics provided by the model will heavily be affected by incidental factors such as the number of response categories adopted in the protocols, etc. Thus additivity is an important feature of models that can be adopted for typical ordered polychotomous responses, and will be discussed further in the subsequent section along with several other features. 5. Criteria for model evaluation The first criterion in evaluating a model should be to find out if the principle behind the model and the set of accompanied assumptions agree with the psycholog ical processes presumed to underlie the data. Without satisfying this criterion, mathematical modeling cannot be realized, and the research could end up with mere curve fittings, producing meaningless item parameters.

13 From the observations made in the preceding section, in typical ordered poly chotomous situations, additivity intrinsic in a model is required, and this will legitimately be adopted as the second criterion for evaluating ordered poly chotomous response models. As the number of ordered polychotomous categories increases, the situation approaches a continuous response case as the limiting situation. It is desirable that such a natural generalization to a continuous response model holds, and this will be the third criterion, which is a natural extension of additivity. In estimating the individual's ability level from his/her response pattern, it is desirable that its likelihood function has a unique modal point, or, otherwise, multiple maximum likelihood estimates of his/her ability will be resulted. Same jima (1969, 1972) proposed a sufficient condition for a unique maximum such that abaxg(0)<0 (5.1) is satisfied for all 0 except, possibly, for an enumerable points, and lim Axg(B) > (5.2) and lim Axg(0)<<-0, B-co (5.3) where Axg(e) is called the basic function and given by Axg(e)= ab log Pxg(e). For brevity, the above set of joint conditions is called unique maximum condition. It can be seen from (5.1) through (5.3) that, if a model satisfies the unique maximum condition, then a single local or terminal maximum is provided to the operating characteristic of the item score. It will also assure that the likelihood function of any response pattern consisting of such responses has one and only one local or terminal maximum. Thus satisfaction of the unique maximum condition will be the fourth legitimate criterion for evaluating ordered polychotomous models. The inequality given by (5.1) can be replaced by = a2 IxgB) e2 log Pxg(e) > 0, where Ixg(0) is called the item response information function (Samejima, 1972, 1973b). Adding to this criterion, it will be desirable from the meaning of the item score that within a single item the model provide ordered modal points of the operating characteristics in the ascending order of the item score. Note that these operating characteristics equal the likelihood functions when a test consists of a single item, and this orderliness leads to the orderliness of the maximum likelihood estimate of

14 ability. Thus it will be the legitimate fifth criterion for model evaluation. 6. Model selection In categorical judgment, since each item is expected to have a certain solid relationship with the latent trait 0, it will be appropriate to assume that there be some invariance in the relationship between item g and the latent trait 0 regardless of the categorical thresholds. This can be realized by a common discrimination parameter for all item responses, which provides the same discrimination power when the item is dichotomously rescored by selecting one of the thresholds of two adjacent response categories. If the discrimination power of the item changes substantially by the selection of different thresholds for redichotomization even though wordings for separate categories are appropriately made, it will be doubted that the item has a solid relationship with the underlying latent trait. The same logic will be applied to many rating scales also. A model in the homogeneous case represented by (3.4), therefore, will be a decent choice. The different characteristics of the processing functions in the normal ogive model and the logistic model, which were discussed earlier, can be interpreted as the difference in relative emphases on the two joint conditions for the processing function. In certain situations it may be reasonable to assume that the processing function be close to zero at very low levels of 0 regardless of xg and of the fact that the steps up to (xg -1) have been cleared ; the normal ogive model will be more appropriate than the logistic model in these cases. In certain other situations, however, it may be more reasonable to assume that the fact that the individual has cleared up to the step (xg-1) entitles the processing function to be positive no matter how low the individual's ability level may be, and this lower asymptote be higher if the difficulty levels of the current step and of the preceding one are closer ; the logistic model will be more appropriate in such cases. Distinct from situations exemplified above, in many other situations including cognitive processes such as problem solving, homogeneity restriction in the shapes of P g(b)'s may not agree with the psychological reality ; it may be more logical to assume heterogeneous relationships of the separate steps, which lead to the problem solution, to 0. Thus a model in the heterogeneous case will be a more legitimate choice. Out of the five criteria for model evaluation, while the fit of the principles behind each model to the psychological background of the data is specific, the other four criteria, that is, additivity intrinsic in a model, generalizability to a continuous response model, satisfaction of the unique maximum condition and orderliness of the modal points of the operating characteristics, are more general in the sense that they are appropriate in most typical ordered polychotomous response situations. These four criteria will be discussed, therefore, for models in the homogeneous case, the acceleration model, and extended Bock models.

15 6.1 Additivity A strength of any model in the homogeneous case is that additivity always holds. Assuming that the item in question has a solid and straight-forward rela tionship with the latent trait 0, if q(>1) new ordered polychotomous response categories are added between xg and (xg + 1), then identical shapes of P e(8) will be preserved with or without shifts of locations of those curves of the preexisting item scores alongside the abscissa, and P g(o)'s for the q new graded response categories will also have identical shapes, with their location parameters found between bxg and bxg+,. In the flexible threshold situation, for example, if B+ is added to the preexisting letter grades, A, B, C, D and F, then the location parameters for these 5 letter grades will more or less be affected, that is, those for B and A will be substantially lowered and elevated, respectively, and the location parameter will be less and less affected as the letter grade departs from B. The location parameter for the new item score B+ will be found between those of B and A. In the fixed threshold situation, if r adjacent graded categories from xg to xg + r -1 for r < mg xg + 1 are combined, from (3.3) and (3.4) the operating charac teristic Pxg(B) will be changed to Pxg(8)=P g(e)-p(xg+r)(0) ag(b-bxg) ag(b-bxg+r) = f O(U)du f O(u)du which obviously belongs to the same, original model, and any other operating characteristic is preserved, with the shift of the item score by (-r+1) from the original value when it is greater than xg. Thus if we redichotomize the letter grades A, B, C, D and F into Pass and Fail, for example, setting the borderline between, say, B and C, the resulting operating characteristics of Pass and Fail still belong to the same model. This does not hold for all models in the heterogeneous case, however. In general, because of their more complicated mathematical forms stemming from heterogeneity, it becomes more difficult to develop a model which satisfies ad ditivity. In the acceleration model, since a subprocess wxgi affects the operating charac teristic Pxg(B) solely through Ewxgr in the way shown in (3.11), it is obvious that addition of q(>_1) ordered polychotomous categories between xg and (xg+l) will provide q operating characteristics that belong to the same, original model, that is, the first feature of additivity is fulfilled. When r(> 2) adjacent item scores are combined into one step, the second feature of additivity still holds only if the xg(b)'s for these r steps have identical axe's and 3,'s; if not, it will not hold. Robustness of the acceleration model has been observed (Samejima, 1995), however, in the latter situation. This means that a set of axg and 3xg in (3.9) is likely to be discovered for the combined step which provides almost an identical Pxg(B) with the sum of the original r operating characteristics in practical situations. An example

16 has been shown (Samejima, 1995), in which the parameters ax,,, /3Xg and EXg were estimated as the solutions of log [(p2)-l' _ 1] -log [(p3)-liexg-1] log [(pl-"'xg-1]-log [(p2)-l/exg Mxg(axg) = [+1 (6.1.1) (6.1.2) and 2`Xg+1 a axg= D~ MXg(e) at e=qxg, (6.1.3) xg ae where Mxg(O) is the product of the processing functions, M( + 's, with xg's indicat ing all adjacent item score categories to be combined, pl, P2 and p3 are arbitrarily selected three distinct probabilites arranged in the ascending order and 01, e2 and e3 are the values of 0 at which Mxg(B) equals pi, P2 and p3, respectively. The second feature of additivity practially holds in the acceleration model. The solutions of (6.1.1.), (6.1.2) and (6.1.3) were also adopted in estimating the parameters in the acceleration model and used in Figure 1, with the processing functions in Master's partial credit model as Mxg(e)'s which are obtained by (3.18) and (3.19). In divided-by-total models represented by extended Bock models, the operating characteristic resulting from combining r adjacent graded categories becomes PXg(e)= 21u EXg exp [aue+qu]' which does not belong to the original model. It is also impossible to divide a response into more finely categorized responses and preserve (3.15) for the resulting finer responses. Thus additivity will not hold in either direction. Lack of additivity will cause serious problems if Masters' partial credit model or Muraki's generalized partial credit model is used for typical ordered response situations. Usefulness of these models is limited, therefore, to situations in which all response categories have certain absolute meanings, with no room for arbitrari ness. 6.2 Natural generalization to a continuous response model For any model which belongs to the homogeneous case, from (3.3) and (3.4) the operating density characteristic, H,(8), for the continuous response zg will be obtained by Hzg(e)=a B m Pe(e) ~ zg+azg)(e) =agcb(ag(e-bzg))[ db g ~'

17 where b,zg is the difficulty parameter for the continuous response zg. Unlike bxg in ordered polychotomous models, b,zg is a continuous, strictly increasing function of zg. Examples of such models can be seen in the normal ogive and logistic models (Samejima, 1973a). This characteristic is not shared by all models in the heterogeneous case. The acceleration model can be generalized to a continuous model, however. This can be seen by treating each subprocess in (3.11) as a step, and continuous subprocesses within each original step are considered as the limiting situation when sxg tends to positive infinity and Ewgz approaches zero. In contrast, such a limiting situation for any extended Bock model cannot be considered. 6.3 Satisfaction of the unique maximum condition In the homogeneous case, it has been shown (Samejima, 1972, 1973b) that many models, including the normal ogive and logistic models, satisfy the unique maximum condition, although it is not satisfied by the three-parameter normal ogive or logistic model. Note that, because additivity and generalizability to a continuous response model hold for any model in the homogeneous case, as was observed and discussed earlier, satisfaction of the unique maximum condition is carried to any more finely classified response categories and any combined response categories, and also to continuous responses in its generalized continuous response model (see Samejima, 1972, 1973a). In the heterogeneous case, it has been shown (Same] ima, 1995) that the unique maximum condition is satisfied for the acceleration model when (3.10) is used for MXg(O). Note that this satisfaction is also carried to more finely categorized steps and also to continuous responses in the generalized continuous response model. It has been shown (Samejima, 1972) that this condition is satisfied for Bock's nominal model. This implies that in all extended Bock models, including Masters partial credit model and Muraki's generalized partial credit model, the unique maximum condition is also satisfied. 6.4 Orderliness of the modal points of the operating characteristics It has been shown (Samejima, 1972) that, in the homogeneous case, those models which satisfy the unique maximum condition, such as the normal ogive model and the logistic model, provide a strict orderliness among the modal points of PXg(e)'s in accordance with the item score xg. In the acceleration model, it has been shown (Samejima, 1995) that ordered modal points of the operating characteristics exist in usual situations. There are exceptions, however, in which the unidimensionality of the latent space is question able, and an example is shown in the same paper. In usual situations, the modal points are found in the ascending order of the item score xg, as illustrated in Figure 1. In Bock's nominal model, it has been shown (Samejima, 1972) that ordered

18 Table 1 Summary of the principles behind the normal ogive and logistic models, the acceleration model and extended Bock models. modal points of the operating characteristics exist, if strict inequalities hold in all relationships of (3.17). This implies that in extended Bock models, including both Masters' partial credit model and Muraki's generalized partial credit model, the modal points are always strictly ordered. Comparison of these models with respect to the criteria that are legitimate in typical ordered polychotomous situations can be summarized as shown in Table Discussion and conclusions Several criteria for model evaluation have been proposed, and strengths and weaknesses of mathematical models in the homogeneous case, and of the accelera tion model and extended Bock models in the heterogeneous case have been observed and discussed. It has been pointed out that principles behind models in the homoge neous case fit psychological realities of certain typical graded response situations such as categorical judgment and rating scales. As was pointed out earlier, satisfaction of additivity and generalizability to a continuous model tend to be more difficult for models in the heterogeneous case. Those models are in demand, however, because of their less restrictive nature in P g(o)'s. The principles behind extended Bock models do not fit typical ordered poly chotomous response situations, however. Bock proposed (3.15) as a nominal model, and wisely applied his model to multiple-choice test items, having the results disclose implicit orders among the distractors of each item. Discovery of implicit orders behind nominal responses is a big strength of Bock's model. Information coming from the distractors can be used for improving the multiple-choice test item, for example. Satisfaction of the unique maximum condition and orderliness of the modal points of the operating characteristics intrinsic in the model are also big strengths, for information coming from each and every nominal response can

19 effectively be used in ability estimation. It may be wise, therefore, to let Bock's model stay as a nominal response model as Bock himself intended it to be when he proposed the model without expanding it to ordered polychotomous models. It is obvious that the last two criteria in model evaluation are invariant for strictly monotone transformations of 0. That is to say, if a model clears these criteria with 0 as the ability scale, the same will be true with the transformed ability scale r(0), as long as the transformation is strictly monotone. Homogene ity in a model is not invariant across strictly monotone transformations, however. Thus those models in the homogeneous case must be adopted with a carefully selected meaningful ability scale. REFERENCES Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability, Contributed chapters in Lord, F.M. and Novick, M.R., Statistical theories of mental test scores, Chapters 17-20, Reading, MA: Addison Wesley. Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, Luce, R.D. (1959). Individual choice behavior, New York : Wiley. Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, Muraki, E. (1992). A generalized partial credit model : application of an EM algorithm. Applied Psychological Measurement, 16, Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, No. 17. Samejima, F. (1972). A general model for free-response data. Psychometrika Monograph, No. 18. Samejima, F. (1973a). Homogeneous case of the continuous response model. Psychometrika, 38, Samejima, F. (1973b). A comment on Birnbaum's three-parameter logistic model in the latent trait theory, Psychometrika, 38, Samejima, F. (1979). A new family of models for the multiple-choice item. University of Tennes see, Knoxville, TN : Office of Naval Research Report, Samejima, F. (1993a). An approximation for the bias function of the maximum likelihood estimate of a latent variable for the general case where the item responses are discrete. Psychometri ka, 58, Samejima, F. (1993b). The bias function of the maximum likelihood estimate of ability for the dichotomous response level. Psychometrika, 58, Samejima, F. (1995). Acceleration model in the heterogeneous case of the general graded response model. Psychometrika, 60, (to be published in the December issue). Samejima, F. (in preparation). Virtues of asymmetric item characteristic curves. Thissen, D. & Steinberg, L. (1986). A taxonomy of item response models Psychometrika, 51, (Received October, 1995)

A Use of the Information Function in Tailored Testing

A Use of the Information Function in Tailored Testing A Use of the Information Function in Tailored Testing Fumiko Samejima University of Tennessee for indi- Several important and useful implications in latent trait theory, with direct implications vidualized

More information

Nonparametric Online Item Calibration

Nonparametric Online Item Calibration Nonparametric Online Item Calibration Fumiko Samejima University of Tennesee Keynote Address Presented June 7, 2007 Abstract In estimating the operating characteristic (OC) of an item, in contrast to parametric

More information

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models Whats beyond Concerto: An introduction to the R package catr Session 4: Overview of polytomous IRT models The Psychometrics Centre, Cambridge, June 10th, 2014 2 Outline: 1. Introduction 2. General notations

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Gerhard Tutz On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Technical Report Number 218, 2018 Department

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models Journal of Educational and Behavioral Statistics 2015, Vol. 40, No. 1, pp. 96 105 DOI: 10.3102/1076998614558122 # 2014 AERA. http://jebs.aera.net A Note on the Equivalence Between Observed and Expected

More information

Recommended citation: Samejima, F. (1972). A General Model for Free-Response Data (Psychometric Monograph No. 18). Richmond, VA: Psychometric

Recommended citation: Samejima, F. (1972). A General Model for Free-Response Data (Psychometric Monograph No. 18). Richmond, VA: Psychometric Recommended citation: Samejima, F. (1972). A General Model for Free-Response Data (Psychometric Monograph No. 18). Richmond, VA: Psychometric Society. Retrieved from http://www.psychometrika.org/journal/online/mn18.pdf

More information

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model Rand R. Wilcox University of Southern California Based on recently published papers, it might be tempting

More information

Using Graphical Methods in Assessing Measurement Invariance in Inventory Data

Using Graphical Methods in Assessing Measurement Invariance in Inventory Data Multivariate Behavioral Research, 34 (3), 397-420 Copyright 1998, Lawrence Erlbaum Associates, Inc. Using Graphical Methods in Assessing Measurement Invariance in Inventory Data Albert Maydeu-Olivares

More information

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE 3.1 Model Violations If a set of items does not form a perfect Guttman scale but contains a few wrong responses, we do not necessarily need to discard it. A wrong

More information

A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY

A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY R. John Irwin Department of Psychology, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand

More information

Creative Objectivism, a powerful alternative to Constructivism

Creative Objectivism, a powerful alternative to Constructivism Creative Objectivism, a powerful alternative to Constructivism Copyright c 2002 Paul P. Budnik Jr. Mountain Math Software All rights reserved Abstract It is problematic to allow reasoning about infinite

More information

What is an Ordinal Latent Trait Model?

What is an Ordinal Latent Trait Model? What is an Ordinal Latent Trait Model? Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße 1, 80799 München February 19, 2019 arxiv:1902.06303v1 [stat.me] 17 Feb 2019 Abstract Although various

More information

Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses

Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses David Thissen, University of North Carolina at Chapel Hill Mary Pommerich, American College Testing Kathleen Billeaud,

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Lab 3. Newton s Second Law

Lab 3. Newton s Second Law Lab 3. Newton s Second Law Goals To determine the acceleration of a mass when acted on by a net force using data acquired using a pulley and a photogate. Two cases are of interest: (a) the mass of the

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS by CIGDEM ALAGOZ (Under the Direction of Seock-Ho Kim) ABSTRACT This study applies item response theory methods to the tests combining multiple-choice

More information

On the Impossibility of Certain Ranking Functions

On the Impossibility of Certain Ranking Functions On the Impossibility of Certain Ranking Functions Jin-Yi Cai Abstract Suppose all the individuals in a field are linearly ordered. Groups of individuals form teams. Is there a perfect ranking function

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Introduction To Confirmatory Factor Analysis and Item Response Theory

Introduction To Confirmatory Factor Analysis and Item Response Theory Introduction To Confirmatory Factor Analysis and Item Response Theory Lecture 23 May 3, 2005 Applied Regression Analysis Lecture #23-5/3/2005 Slide 1 of 21 Today s Lecture Confirmatory Factor Analysis.

More information

1. THE IDEA OF MEASUREMENT

1. THE IDEA OF MEASUREMENT 1. THE IDEA OF MEASUREMENT No discussion of scientific method is complete without an argument for the importance of fundamental measurement - measurement of the kind characterizing length and weight. Yet,

More information

Sampling Distributions

Sampling Distributions Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence A C T Research Report Series 87-14 The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence Terry Ackerman September 1987 For additional copies write: ACT Research

More information

PROGRAM STATISTICS RESEARCH

PROGRAM STATISTICS RESEARCH An Alternate Definition of the ETS Delta Scale of Item Difficulty Paul W. Holland and Dorothy T. Thayer @) PROGRAM STATISTICS RESEARCH TECHNICAL REPORT NO. 85..64 EDUCATIONAL TESTING SERVICE PRINCETON,

More information

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture

More information

Background on Coherent Systems

Background on Coherent Systems 2 Background on Coherent Systems 2.1 Basic Ideas We will use the term system quite freely and regularly, even though it will remain an undefined term throughout this monograph. As we all have some experience

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Correlations with Categorical Data

Correlations with Categorical Data Maximum Likelihood Estimation of Multiple Correlations and Canonical Correlations with Categorical Data Sik-Yum Lee The Chinese University of Hong Kong Wal-Yin Poon University of California, Los Angeles

More information

Group Dependence of Some Reliability

Group Dependence of Some Reliability Group Dependence of Some Reliability Indices for astery Tests D. R. Divgi Syracuse University Reliability indices for mastery tests depend not only on true-score variance but also on mean and cutoff scores.

More information

An analogy from Calculus: limits

An analogy from Calculus: limits COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 41 A Comparative Study of Item Response Theory Item Calibration Methods for the Two Parameter Logistic Model Kyung

More information

Manual of Logical Style

Manual of Logical Style Manual of Logical Style Dr. Holmes January 9, 2015 Contents 1 Introduction 2 2 Conjunction 3 2.1 Proving a conjunction...................... 3 2.2 Using a conjunction........................ 3 3 Implication

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010 Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT

More information

Preference, Choice and Utility

Preference, Choice and Utility Preference, Choice and Utility Eric Pacuit January 2, 205 Relations Suppose that X is a non-empty set. The set X X is the cross-product of X with itself. That is, it is the set of all pairs of elements

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

Limits and Continuity

Limits and Continuity Chapter Limits and Continuity. Limits of Sequences.. The Concept of Limit and Its Properties A sequence { } is an ordered infinite list x,x,...,,... The n-th term of the sequence is, and n is the index

More information

Economics 205, Fall 2002: Final Examination, Possible Answers

Economics 205, Fall 2002: Final Examination, Possible Answers Economics 05, Fall 00: Final Examination, Possible Answers Comments on the Exam Grades: 43 possible; high: 413; median: 34; low: 36 I was generally happy with the answers to questions 3-8, satisfied with

More information

Lecture 1: Introduction Introduction

Lecture 1: Introduction Introduction Module 1: Signals in Natural Domain Lecture 1: Introduction Introduction The intent of this introduction is to give the reader an idea about Signals and Systems as a field of study and its applications.

More information

Monte Carlo Simulations for Rasch Model Tests

Monte Carlo Simulations for Rasch Model Tests Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Observed-Score "Equatings"

Observed-Score Equatings Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" Frederic M. Lord and Marilyn S. Wingersky Educational Testing Service Two methods of equating tests are compared, one using true

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS THUY ANH NGO 1. Introduction Statistics are easily come across in our daily life. Statements such as the average

More information

A Guide to Proof-Writing

A Guide to Proof-Writing A Guide to Proof-Writing 437 A Guide to Proof-Writing by Ron Morash, University of Michigan Dearborn Toward the end of Section 1.5, the text states that there is no algorithm for proving theorems.... Such

More information

Dimensionality of Hierarchical

Dimensionality of Hierarchical Dimensionality of Hierarchical and Proximal Data Structures David J. Krus and Patricia H. Krus Arizona State University The coefficient of correlation is a fairly general measure which subsumes other,

More information

The Evolution and Discovery of the Species of Equality in Euclid s Elements

The Evolution and Discovery of the Species of Equality in Euclid s Elements From the SelectedWorks of Lee T Nutini 2010 The Evolution and Discovery of the Species of Equality in Euclid s Elements Lee T Nutini Available at: https://works.bepress.com/nutini/2/ Nutini 1 The Evolution

More information

An Overview of Item Response Theory. Michael C. Edwards, PhD

An Overview of Item Response Theory. Michael C. Edwards, PhD An Overview of Item Response Theory Michael C. Edwards, PhD Overview General overview of psychometrics Reliability and validity Different models and approaches Item response theory (IRT) Conceptual framework

More information

Preliminary Manual of the software program Multidimensional Item Response Theory (MIRT)

Preliminary Manual of the software program Multidimensional Item Response Theory (MIRT) Preliminary Manual of the software program Multidimensional Item Response Theory (MIRT) July 7 th, 2010 Cees A. W. Glas Department of Research Methodology, Measurement, and Data Analysis Faculty of Behavioural

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger

More information

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff

More information

Introducing Proof 1. hsn.uk.net. Contents

Introducing Proof 1. hsn.uk.net. Contents Contents 1 1 Introduction 1 What is proof? 1 Statements, Definitions and Euler Diagrams 1 Statements 1 Definitions Our first proof Euler diagrams 4 3 Logical Connectives 5 Negation 6 Conjunction 7 Disjunction

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

The Difficulty of Test Items That Measure More Than One Ability

The Difficulty of Test Items That Measure More Than One Ability The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article

More information

GRAPHIC REALIZATIONS OF SEQUENCES. Under the direction of Dr. John S. Caughman

GRAPHIC REALIZATIONS OF SEQUENCES. Under the direction of Dr. John S. Caughman GRAPHIC REALIZATIONS OF SEQUENCES JOSEPH RICHARDS Under the direction of Dr. John S. Caughman A Math 501 Project Submitted in partial fulfillment of the requirements for the degree of Master of Science

More information

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44 Contents 1 Introduction 4 1.1 Measuring Latent Attributes................. 4 1.2 Assumptions in Item Response Theory............ 6 1.2.1 Local Independence.................. 6 1.2.2 Unidimensionality...................

More information

A NEW SET THEORY FOR ANALYSIS

A NEW SET THEORY FOR ANALYSIS Article A NEW SET THEORY FOR ANALYSIS Juan Pablo Ramírez 0000-0002-4912-2952 Abstract: We present the real number system as a generalization of the natural numbers. First, we prove the co-finite topology,

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

Dimensionality Assessment: Additional Methods

Dimensionality Assessment: Additional Methods Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy

More information

Paul Barrett

Paul Barrett Paul Barrett email: p.barrett@liv.ac.uk http://www.liv.ac.uk/~pbarrett/paulhome.htm Affiliations: The The State Hospital, Carstairs Dept. of of Clinical Psychology, Univ. Of Of Liverpool 20th 20th November,

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

Time: 1 hour 30 minutes

Time: 1 hour 30 minutes Paper Reference(s) 6663/0 Edexcel GCE Core Mathematics C Gold Level G5 Time: hour 30 minutes Materials required for examination Mathematical Formulae (Green) Items included with question papers Nil Candidates

More information

Graph Theorizing Peg Solitaire. D. Paul Hoilman East Tennessee State University

Graph Theorizing Peg Solitaire. D. Paul Hoilman East Tennessee State University Graph Theorizing Peg Solitaire D. Paul Hoilman East Tennessee State University December 7, 00 Contents INTRODUCTION SIMPLE SOLVING CONCEPTS 5 IMPROVED SOLVING 7 4 RELATED GAMES 5 5 PROGENATION OF SOLVABLE

More information

IIIIIIIIIIIIII IEEIIIIhIIIIIK EIIIIIIIIIIIII EI...

IIIIIIIIIIIIII IEEIIIIhIIIIIK EIIIIIIIIIIIII EI... D-RI31 686 A COGNITIVE LATENT TRAIT PROCESSES(U) MODEL FOR TENNESSEE DIFFERENTIAL UNIV STRATEGIES KNOXVILLE DEPT IN OF 1/1 PSYCHOLOGY F SAMEJIMR JUN 83 RR-83-1 N00814-8i-C-0569 UNCLASSIFIED F/G 5/10 NL

More information

ACCRS/QUALITY CORE CORRELATION DOCUMENT: ALGEBRA I

ACCRS/QUALITY CORE CORRELATION DOCUMENT: ALGEBRA I ACCRS/QUALITY CORE CORRELATION DOCUMENT: ALGEBRA I Revised March 25, 2013 Extend the properties of exponents to rational exponents. 1. [N-RN1] Explain how the definition of the meaning of rational exponents

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

GRADE 6 Projections Masters

GRADE 6 Projections Masters TEKSING TOWARD STAAR MATHEMATICS GRADE 6 Projections Masters Six Weeks 1 Lesson 1 STAAR Category 1 Grade 6 Mathematics TEKS 6.2A/6.2B Understanding Rational Numbers A group of items or numbers is called

More information

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Introduction to Proofs in Analysis updated December 5, 2016 By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Purpose. These notes intend to introduce four main notions from

More information

CHAPTER 2 BASIC MATHEMATICAL AND MEASUREMENT CONCEPTS

CHAPTER 2 BASIC MATHEMATICAL AND MEASUREMENT CONCEPTS CHAPTER 2 BASIC MATHEMATICAL AD MEASUREMET COCEPTS LEARIG OBJECTIVES After completing Chapter 2, students should be able to: 1. Assign subscripts using the X variable to a set of numbers. 2 Do the operations

More information

Development and Calibration of an Item Response Model. that Incorporates Response Time

Development and Calibration of an Item Response Model. that Incorporates Response Time Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,

More information

SCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING. Self-paced Course

SCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING. Self-paced Course SCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING Self-paced Course MODULE ALGEBRA Module Topics Simplifying expressions and algebraic functions Rearranging formulae Indices 4 Rationalising a denominator

More information

Math 308 Midterm November 6, 2009

Math 308 Midterm November 6, 2009 Math 308 Midterm November 6, 2009 We will write A 1,..., A n for the columns of an m n matrix A. If x R n, we will write x = (x 1,..., x n ). he null space and range of a matrix A are denoted by N (A)

More information

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval Inf1-DA 2010 20 III: 28 / 89 Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval Statistical Analysis of Data: III.2 Data scales and summary statistics III.3 Hypothesis

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

The Discriminating Power of Items That Measure More Than One Dimension

The Discriminating Power of Items That Measure More Than One Dimension The Discriminating Power of Items That Measure More Than One Dimension Mark D. Reckase, American College Testing Robert L. McKinley, Educational Testing Service Determining a correct response to many test

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Math 308 Spring Midterm Answers May 6, 2013

Math 308 Spring Midterm Answers May 6, 2013 Math 38 Spring Midterm Answers May 6, 23 Instructions. Part A consists of questions that require a short answer. There is no partial credit and no need to show your work. In Part A you get 2 points per

More information

MATH 100 and MATH 180 Learning Objectives Session 2010W Term 1 (Sep Dec 2010)

MATH 100 and MATH 180 Learning Objectives Session 2010W Term 1 (Sep Dec 2010) Course Prerequisites MATH 100 and MATH 180 Learning Objectives Session 2010W Term 1 (Sep Dec 2010) As a prerequisite to this course, students are required to have a reasonable mastery of precalculus mathematics

More information

Stochastic dominance with imprecise information

Stochastic dominance with imprecise information Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is

More information

Manipulating Radicals

Manipulating Radicals Lesson 40 Mathematics Assessment Project Formative Assessment Lesson Materials Manipulating Radicals MARS Shell Center University of Nottingham & UC Berkeley Alpha Version Please Note: These materials

More information

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them. Chapter 2 Statistics In the present chapter, I will briefly review some statistical distributions that are used often in this book. I will also discuss some statistical techniques that are important in

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

UTAH CORE STATE STANDARDS for MATHEMATICS. Mathematics Grade 7

UTAH CORE STATE STANDARDS for MATHEMATICS. Mathematics Grade 7 Mathematics Grade 7 In Grade 7, instructional time should focus on four critical areas: (1) developing understanding of and applying proportional relationships; (2) developing understanding of operations

More information

DR.RUPNATHJI( DR.RUPAK NATH )

DR.RUPNATHJI( DR.RUPAK NATH ) Contents 1 Sets 1 2 The Real Numbers 9 3 Sequences 29 4 Series 59 5 Functions 81 6 Power Series 105 7 The elementary functions 111 Chapter 1 Sets It is very convenient to introduce some notation and terminology

More information

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm Margo G. H. Jansen University of Groningen Rasch s Poisson counts model is a latent trait model for the situation

More information