A Unified Approach to Linear Equating for the Non-Equivalent Groups Design

Size: px
Start display at page:

Download "A Unified Approach to Linear Equating for the Non-Equivalent Groups Design"

Transcription

1 Research Report A Unified Approach to Linear Equating for the Non-Equivalent Groups Design Alina A. von Davier Nan Kong Research & Development November 003 RR-03-31

2

3 A Unified Approach to Linear Equating for the Non-Equivalent Groups Design Alina A. von Davier and Nan Kong Educational Testing Service, Princeton, NJ November 003

4 Research Reports provide preliminary and limited dissemination of ETS research prior to publication. They are available without charge from: Research Publications Office Mail Stop 7-R Educational Testing Service Princeton, NJ 08541

5 Abstract This paper describes a new, unified framework for linear equating in a Non-Equivalent-groups Anchor Test (NEAT) design. We focus on three methods for linear equating in the NEAT design Tucker, Levine observed-score, and chain and develop a common parameterization that allows us to show that each particular equating method is a special case of the linear equating function in the NEAT design. We use a new concept, the Method Function, to distinguish among the linear equating functions, in general, and among the three equating methods, in particular. This approach leads to a general formula for the standard error of equating for all equating functions in the NEAT design. We also present a new tool, the standard error of equating difference, to investigate if the observed difference in the equating functions is statistically significant. Key words: Test equating, Non-Equivalent groups Anchor Test (NEAT) design, Tucker equating, Levine observed-score equating, chain linear equating, standard error of equating, delta method i

6 Acknowledgments The authors would like to thank Paul Holland, Neil Dorans, Hariharan Swaminathan, Dan Eignor, Shelby Haberman, Skip Livingston, and Krishna Tateneni for helpful comments and suggestions during the development of this project. We are also thankful to Bruce Kaplan and Ted Blew, who were very supportive in developing the software and carrying out the resampling procedure. We would also like to thank Kim Fryer, Elizabeth Brophy, and Diane Rein for their help in editing the manuscript. The Educational Testing Service Research Allocation supported our work. Any opinions expressed in this paper are those of the authors and not necessarily of Educational Testing Service. ii

7 Test equating methods are statistical tools used to produce exchangeable scores across different test forms. In particular, observed-score equating methods, as opposed to true-scores equating methods, refer to the transformation of the raw scores of a new test, X, on the raw scores of an old test, Y. Any test equating process consists of a data collection design and different equating methods. This paper focuses on the Non-Equivalent-groups Anchor Test (NEAT) design and the linear equating function. The NEAT design is a data collection design widely used in practice. It involves two populations of test-takers (usually different test administrations), P and Q, and a sample of examinees from each. The sample from P takes test X, the sample from Q takes test Y, and both samples take an anchor test, V, which is used to link X and Y. For the NEAT design, several observed-score equating methods are commonly used. Here we focus only on observed-score linear equating methods for the NEAT design. In this paper, we take a new, mathematical approach to three linear observed-score equating methods Tucker equating for the NEAT design with an anchor (T), Levine observedscore equating (L), and chain linear equating (CL) by emphasizing their common framework and their similarities. We define these methods carefully later. In this way, we introduce a unified approach to linear equating in the NEAT design, and we show that each of these equating methods is a special case of the linear equating function. This approach allows us to establish new theoretical results on otherwise well-known equating methods, creating a conceptual shift in the analysis of the observed-score linear equating methods in the NEAT design: There are not only disparate methods, each with its own framework, but they share the same parameter space and have numerous similarities. One of the consequences is that we can develop one general formula for the standard error of equating (SEE) that is applicable to most of the observed-score linear equating functions for the NEAT design that are available. We also introduce in this paper a new, practical tool, the standard error of equating difference (SEED), for investigating whether the differences between the (linear) equating methods are statistically significant. More precisely, in this paper we investigate the linear equating in the NEAT design from several points of view: 1. We put the three methods on a common footing by developing the same parameterization for the (equating) functions. We also make use of the concept of a 1

8 Method Function in the framework of the linear equating to show that there is only one definition of the linear observed-score equating function that might have different special cases. (A related concept, the Design Function, was first introduced in von Davier, Holland, & Thayer, 004, in a different context, to model the data collection design.) This approach leads to a general formula for the SEE for linear equating in the NEAT design in general and for the three equating functions in particular.. We generalize the SEED (von Davier et al., 004) for any pair of (linear) equating functions that share the same set of parameters. The SEED is a new tool to investigate whether the difference between the equating functions is statistically significant. 3. We use real and resampled data from two national administrations of a high volume testing program to illustrate the SEE (computed via the new general formula) and the SEED. We do not make any distributional assumptions about the variables involved in this theoretical exposition. Linear Equating Function for the NEAT Design This section sets up the basic notation. We assume there are two tests to be equated, X and Y, and a target population, T, on which this is to be done (Braun & Holland, 198; Kolen & Brennan, 1995). In this paper, we use the standard notation of µ and for the means and the variances. We also use the symbol π to denote the parameters in general. The subscripts usually indicate the variable and the population. We use with appropriate subscripts to denote the covariances; for example, XVP, ; denotes the covariance of X and V in P, while Σ denotes the covariance matrix of π. Many observed score equating methods are based on the linear equating function. Usually, the rational behind the linear equating on the target population, T, is to set standardized deviation scores (z-scores) on the two forms to be equal such that x µ XT y µ = XT YT YT,

9 where µ YT, YT, µ XT, and XT are the means and the variances of X and Y in T. Solving for y in the above equation results in the formula for the linear equating function, ( x) µ (( x µ ) ) Lin = + /. (1) XY ; T YT YT XT XT In the NEAT design, there are also another rational and implicitly another definition of a linear equating function, that is, the chain linear equating function. The chain linear equating function is given by chaining together the two linear linking functions (i.e., by using the mathematical composition of the two linear functions), from X to V on P and from V to Y on Q, that is, Lin ( x) and Lin ( ) XV ; P VY ; Q v. This results in ( ( )) CL ( x) = Lin Lin x XY VY ; Q XV ; P ( / )(( ( / )( x )) ) ( / )( ) ( / )( / )( = µ + µ + µ µ YQ YQ VQ VP VP XP XP VQ = µ + µ µ + x µ YQ YQ VQ VP VQ YQ VQ VP XP XP ), () and () is the usual form for the chain linear equating function. Moreover, the final equating function does not depend on the target population, T. As shown in von Davier, Holland, and Thayer (in press), () can be rewritten as (1) under appropriate assumptions. This will be discussed in more detail later. Usually, X and Y are the operational tests given to two samples from the two test administrations P and Q, respectively, and V is the anchor test given to both samples from P and Q. The anchor test score, V, can be either a part of both X and Y (called the internal anchor) or a separate score (the external anchor). In this study we assume that the target population, T, for the NEAT design is a mixture of P and Q and is denoted by ( 1 ) T = wp+ w Q, (3) 3

10 (see Braun & Holland, 198, or Kolen & Brennan, 1995, for details on the concept of a target population in the NEAT design). The target population in (3) is determined by a weight w. When w = 1, then T = P, and when w = 0, then T = Q. Other choices of w may be used as well. Typically, w is the ratio of the sample size of the group from P and the sum of the sample sizes of the two groups. In the NEAT design, X and Y are each only observed either on P or on Q, but not both. Thus, X and Y are not both observed on T, regardless of the choice of w. For this reason assumptions must be made in order to overcome this lack of complete information in the NEAT design. The three equating methods used in the NEAT design that concern us here, Tucker, Levine, and chain linear equating, make different assumptions about the distributions of X and Y in the populations where they are not observed. We identify these assumptions in the next section. Tucker, Levine, and Chain Linear Equating Methods In this section we briefly describe the methods we use and their assumptions, which can be found in more detail somewhere else (Kolen & Brennan, 1995, pp ; Angoff, 1984; von Davier et al., in press). Here we provide only the information that is necessary to explain our new approach, which is given in more detail later in this paper. This section is structured as follows: First, we present the Tucker and Levine together, stressing the similarities between them. Although the assumptions that underlie the two methods are different, the computational forms are similar. We will not give computational details on Tucker and Levine because they are well documented (Kolen & Brennan, 1995, pp ). Then, we describe the chain linear equating method, following the development given in von Davier et al. (in press). In the next section, we develop a common parameterization for the three functions that allows us to compare the equating functions as well as their standard errors (SEE). Tucker Equating Method: Assumptions T1: The linear regressions of X on V and of Y on V are the same in the two populations. T: The conditional variances of X given V and of Y on V are the same in the two populations. 4

11 Levine Observed-score Equating Method: Assumptions L1: X, Y, and V all measure the same thing, or, stated in different words, the true scores of the tests (T and T ) and of the anchor (T ) in the two populations are perfectly correlated. X Y V L: The regressions of T on T and of T on T are linear and the same in the two populations. X V Y V L3: The measurement error variances for X and for Y are the same in the two populations. From the two sets of assumptions and from (1) the formulas for the parameters of X and Y on T for Tucker and Levine follow. They are similar in form for the two equating methods: ( w) µ XT = µ XP 1 P µ VP µ VQ, (4) µ = µ + w µ µ YT YQ Q VP VQ, ( ) ( ) XT = XP 1 w P VP VQ + w 1 w P µ VP µ VQ, (6) ( 1 ) YT = YQ + w Q VP VQ + w w Q µ VP µ VQ (7) (5) (see Kolen & Brennan, 1995, pp for the derivations). The four -parameters, which distinguish the two equating methods, Tucker and Levine, have the following formulas: For the Tucker method: = α = = α = (8) XVP, ; YVQ, ; P P and, Q Q VP VQ denotes the covariance of X and V in P and YVQ, ; denotes the covariance of Y and where XVP, ; V in Q. anchor: For the Levine observed-score equating function for a NEAT design with an external 5

12 + + = = and = =, (9) XP X, V ; P YQ Y, V ; Q P γ P Q γq VP + X, V ; P VQ + Y, V ; Q which are the formulas for the Levine function derived under the assumptions L1-L3 and the additional assumption of a congeneric model, for which the error variances are proportional to the effective test lengths (see Kolen & Brennan, 1995, p. 117). For the Levine function for a NEAT design with an internal anchor: = = and = =, XP YQ P γ P Q γq XVP, ; YVQ, ; (10) which are also derived under the additional assumption of a congeneric model (see Kolen & Brennan, 1995, p. 116). Chain Linear Equating Method: Assumptions C1: The (linear) linking function from X on V is the same in the two populations, P and Q. C: The (linear) linking function from V on Y is the same in the two populations, Q and P. We follow the notations and approach to chain linear equating given in von Davier et al. (in press, Appendix A). We do not give any computational detail in this paper; instead we refer to that work and quote only those formulas from it that are necessary for our exposition here. As shown in von Davier et al. (in press, Appendix A) from C1 and C, it follows that on a target population, T, as defined in (3), we have ( / )( XT XP XP VP VT VP ) µ = µ + µ µ, (11) ( / ) = (1) XT VT VP XP, = + ( / )( ) µ µ µ µ YT YQ YQ VQ VT VQ ( ), and (13) = /. (14) YT VT VQ YQ 6

13 Von Davier et al. (in press) shows that under the assumptions C1 and C made by the chain equating CL XY ( x ) defined in () is, in fact, Lin XY ; T ( ) x, as defined in (1). More precisely, that work shows that applying (11) (14) to the chain linear function from () results in (1). The ( XV P ) target population, T, cancels out of the composed function, n Lin ( ) Li x. This provides a VY ; Q ; direct argument that chain linear equating is the linear observed score equating on T with µ µ given by the expressions in (11) (14). XT, YT, XT, and YT Identifying the Parameters of the Tucker, Levine, and Chain Linear Equating Functions In this section, we introduce a common parameterization for the linear equating functions described above. We show that this approach leads to a unified framework for all the linear equating functions in the NEAT design. Consider the linear equating function (1) that equates X to Y on the target population, T, in the form of (3). This equating function depends on µ,, µ,and, which are XT XT YT YT parameters on the population T. We can express this dependence of the equating function on the target population parameters by using the notation Lin XY ; T ( x ; µ XT, XT, µ YT, YT ) = a generic linear equating function. (15) In (4) (14), we observe that the four parameters on T depend on the 10 means, variances, and covariances in the two populations, P and Q. Denote by π the column vector of the 10 parameters from the two bivariate distributions, that is, π = ( µ,, µ,,, µ,, µ,, ) t. (16) XP XP VP VP X, V ; P YQ YQ VQ VQ Y, V ; Q We use a new concept, a function that will map the 10 parameters from the two populations, P and Q, into the four parameters on the population T. To preserve the similarities to von Davier et al. (004) and to emphasize the similarities across the equating methods, we will call this function the Method Function (MF). 7

14 t MF ( π ) = ( µ XT, XT, µ YT, YT ). (17) Now, we rewrite (15) as ( Lin ;MF( ) with π defined in (16). ) XY ; T x π = a linear equating function obtained through a specific MF, (18) The previous section showed that all three linear equating functions, Tucker, Levine, and chain linear, can be expressed as (1). Thus, they can also be expressed as (18), in which the Method Function differs according to which equating method is used. For the Tucker method, the Method Function is described by the formulas (4) (8). For the Levine method, the Method Function is given by (4) (7) and (9) for an external anchor, and by (4) (7) and (10) for an internal anchor. For the chain linear method, the Method Function is described by (11) (14). Each Method Function is given in detail in the appendix in Table A1. From () as well as from (11) (14), we observe that the covariance between X and V on P and the covariance of Y and V on Q do not appear in the formulas of the chain linear equating function. Hence, the chain linear equating function depends only on eight parameters, while the Tucker and Levine functions depend on ten parameters. However, by using (15), (17), and (18), we can express the three linear equating functions as sharing the same parameter space. Note the chain linear function implicitly depends on the covariances between the tests and the anchor only if before computing the equating function, the two bivariate distributions of the tests and the anchor are presmoothed using, for example, log-linear models (see von Davier et al., 004; Holland & Thayer, 000). in (18), that is, Equating functions are estimated by substituting estimates of the population parameters XY ; T ( x π ) = ; ( ( ˆ XY T x π) ) Lin ;MF( ) Lin ;MF, (19) where ˆπ denotes a sample estimate of π. ( ( ) ) The uncertainty in Li n XY ; T x;mf π derives from the uncertainty in the estimate of π. Because the samples are independently drawn from populations P and Q, the covariances 8

15 between each of the five parameters estimated from the population P and the five parameters estimated from the population Q are zero. Hence, the covariance matrix Σ of the parameter π for the three equating functions, Tucker, Levine, and chain linear, is: ΣP 0 Σ =, 0 Σ Q (0) where ΣP denotes the covariance matrix of the five parameters obtained from the population P and Σ denotes the covariance matrix of the five parameters obtained from the population Q. Q Also note that the Braun and Holland linear equating method for the NEAT design (Braun & Holland, 198; Kolen & Brennan, 1995, p. 146) shares the same parameter vector π and has the same covariance matrix of π, as in (0). In this section, we introduced a common parameterization that can be used for most of the available observed-score linear equating functions in the NEAT design. We showed that one could write down the Method Function formulas for each of three methods that we analyzed here, and we think that one could easily write the appropriate Method Function for any other observed-score linear equating function. However, the investigation of additional equating functions is beyond the scope of this study. Standard Error of Equating In this section, we show that using a common parameterization for all linear equating functions in a NEAT design leads to a general formula for the SEE. The delta method, a general method for approximating standard errors that is based on the Taylor expansion (Rao, 1965; Kendall & Stuart, 1977), is widely used for computing standard errors. Kolen (1985) and Hanson, Zeng, and Kolen (1993) used the delta method to compute the SEE for the Tucker method and the Levine method, respectively. Although we also use the delta method for computing the SEE, our approach differs from Kolen (1985) and Hanson et al. (1993) in the following sense: We provide a unified approach that, through the MF, includes not only the Tucker and the Levine methods, but also chain linear equating and other linear observed-score equating functions such as the Braun and Holland 9

16 linear equating method (Braun & Holland, 198; Kolen & Brennan, 1995). In order to emphasize this unity, we focus on the matrix form of the SEEs, (1) below, rather than on the sum form, as did Kolen (1985) and Hanson et al. (1993). The approach presented here has similarities with the approach developed in von Davier et al. (004). Delta Method Applied to Linear Equating We use the delta method to calculate the asymptotic variance, Va r Lin XY ; T x;mf( π), whose square root is the SEE. ( ( )) From the delta method (Theorem A1 in the appendix), it follows that the asymptotic variance of a smooth function, f, that depends on the parameter vector, π, is Var( f ( π) ) = J ( π) Σ( π) J ( π) f t f (1) where J f ( π) is the Jacobian (the matrix or vector of the first derivatives of the function f with respect to the components of π ) computed at the estimated values of π (see also von Davier et al., 004; von Davier, 001). Let the parameter π from (16) be the parameter vector described in Theorem A1 and let f be a linear equating function, ( ) Lin ;MF( ) XY ; T x π, given in (1). The Method Function can refer to any of the Tucker, Levine, and chain linear functions. The Jacobian of ( ) Lin ;MF( ) XY ; T x π according to matrix differentiation theory and differentiability of composition of functions, where J Lin J = J J, f Lin MF is the vector of the first derivatives of the function from (1) with respect to ( µ XT, XT, µ YT, YT ). MF is the matrix of the first derivatives of J ( XT, XT, YT, YT ) is, µ µ with respect to the components of π from (16). In the previous section we showed that the (10 by 10) covariance matrix Σ is the same for the Tucker, Levine, and chain linear functions. Moreover, the Jacobian JLin will also have the same form for all observed-score linear equating functions (the Jacobian of the linear function, for any of the three equating functions, is a 4-dimensional (row) vector). The Jacobian J MF is a 4 by 10-matrix and will have a different form for each of the equating methods. 10

17 ( ) Now, by using (1), the SEE of a linear equating function, Lin XY T x;mf( π), can be expressed as ; ˆ ( ) x = Jˆ Jˆ Σ Jˆ J ˆ () t t SEE Lin MF MF Lin, with Σ from (0). Equation () is the computational formula for the SEE for the Tucker, Levine, and chain linear methods, that is, the formula that might be implemented into a computer program. It is easy to see the computational advantages of having only one formula for the SEE for all linear equating methods. Note that this formula does not require any distributional assumption on the variables involved. The entries of in () can be obtained from Kolen, The derivatives J = J J Σ f Lin MF for the Tucker equating function are given in Kolen (1985). The derivatives J f = J Lin J MF for the Levine function for a NEAT design are given in Hanson et al. (1993). The derivatives J = J J f Lin MF for the chain linear equating, given in Table A, were computed by us. We use the notations SEE T, SEE L, and SEE CL to refer to the SEE for the Tucker, Levine, and chain linear methods, respectively. SEED for Linear Equating Functions In this section, we state a new result that is analogous to (1) and that will allow us to compute a standard error for the difference between two linear equating functions. This standard error can be used to inform discussion about the final form of an equating function. The SEED was first introduced in von Davier et al. (004) for the kernel method of test equating. This paper applies the same concept to the linear equating functions. The main differences between the SEED in von Davier et al. (004) and the SEED here lie in the fact that the parameters of the equating functions and the equating functions themselves differ. In the kernel method of test equating, the parameters are the score probabilities of the tests to be equated (and, in chain equipercentile, also the score probabilities of the anchor test); in the case of linear equating, the parameters are the means, the variances, and the covariances of the tests to be equated and of the anchor test in the two populations, P and Q. 11

18 Consider two equating functions Lin ( x;mf ( π) ) and Lin ( x;mf ( π )) 1, which have the form given in (1) and depend on the same parameter vectors from (16) (i.e., the assumptions on the functions required by the delta method are met). We are interested in ( 1 ( x ) ( π )) V ar Lin ;MF ( π ) Lin x;mf ( ). (3) Theorem 1. If n ( x;mf ( π) ) and Lin ( x;mf ( π )) Li are two equating functions that have the form given in (1) and depend on the same parameter vector, 1 π, from (16), then ( ( x ) ( x )) 1 Var Lin ;MF( π) Lin ;MF ( π) = Jˆ ( Jˆ Jˆ ) Σ( Jˆ Jˆ ) J ˆ, (4) t t 1 Lin MF MF MF1 MF Lin where J Lin is the 4-dimensional-row vector of the first derivatives of the function from (1) with respect to the parameters on T, ( µ XT,, µ YT, YT, J MF is the 4 by 10 matrix of the first derivatives of the four components of the Method Function, XT ) (,,, µ XT µ YT YT ) the components of π, and Σ is the variance-covariance matrix of π, given in (0). XT, with respect to The proof follows from the delta method (Theorem A1), applied to the difference of two smooth functions that depend on the same parameters (see also von Davier et al., 004, chapter 5). Hence, the SEED is ( ( x ) 1 ( x )) SEED =V ar Lin ;MF ( ) Lin ;MF ( ) π π. (5) Corollary 1. The SEEDs for any pair of the three equating functions, Tucker, Levine, and chain linear, are: SEED = Jˆ ( Jˆ Jˆ ) Σ( Jˆ Jˆ ) J ˆ, (6) t t T,L Lin T L T L Lin SEED = Jˆ ( Jˆ Jˆ ) Σ( Jˆ Jˆ ) J ˆ, (7) t t CL,L Lin CL L CL L Lin 1

19 SEED = Jˆ ( Jˆ Jˆ ) Σ( Jˆ Jˆ ) J ˆ, (8) t t T,CL Lin T CL T CL Lin with Σ from (0) The proof follows from Theorem 1. The entries of JLinJT are given in Kolen (1995). The entries of JLinJL are given in Hanson et al. (1993) and the entries of J J are given in Table A. Lin CL In conclusion, the SEED is a measure of the uncertainty in the difference between two equating functions that is due to the estimation of the parameters (the means, variances, and covariances in the two samples). It also reflects the differences in the two Method Functions. We propose the following practical rule: If the difference between two linear equating functions is no larger than the noise level in the data, then this difference would be smaller than twice the SEED in either direction (see also von Davier et al., 004). Study 1 Here we illustrate how the general formula for the SEE and a new tool, the SEED, for the Tucker, Levine, and chain linear methods can be applied using an example that involves data from two national administrations of a high volume testing program. The two testing administrations were in the fall of 001 (P) and in the winter of 000 (Q). We consider this example to be an informative one, in the sense that it departs from the ideal conditions described in von Davier (003) when the equating methods give the same results. Moreover, as seen later, the difference between the three equating functions of interest is about half score point or more, which is a difference that matters for the program from which the data come. (A difference in equating results that is large enough to make a difference in the reported scores is called a difference that matters.) The data, which were collected following a NEAT design with an external anchor, consisted of the raw sample frequencies of rounded formula scores for two parallel, 78 item tests and a 35 item external anchor test given to two samples from a national population of examinees. (The rounded formula scores are scores in which the right minus a quarter wrong formula scores are rounded to integers.) In this study, the negative scores were rounded to zero. 13

20 The data are sample frequencies for two bivariate distributions. We denote the two sets of sample frequencies by n jl = number of examinees with X = x j and V = vl, and m kl = number of examinees with Y = yk and V = vl. In this example, x = 0, x = 1,, x = 78 ; the same is true for y. For v, we have 1 79 v = 0, v = 1,, v = 35. The two sample sizes are given by: N =10,634 and M =11,31. The 1 36 sample correlation of X and V in P was 0.88, and the sample correlation of Y and V in Q was k l Table 1 Summary Statistics for the Observed Distributions of X, Y, V in P and V in Q X Y V P V Q Mean SD From Table 1 we see that the mean of the anchor test V is (±0.08) in population P, and (±0.08) in Q, where 0.08 is the standard error of the mean. Thus Q is a less proficient population than P, as measured by V. In terms of effect sizes, the difference between these two means (.66) is approximately 3% of the average standard deviation of 8.7. For this type of testing program, a mean difference of this magnitude indicates a fairly large difference between the two populations. Before chain linear equating was in use, ETS researchers were guided by the following rules when they had to choose between Tucker and Levine equating: If the standardized mean difference of the anchor scores in the two samples is smaller than 0.5, then choose the Tucker method, and If the ratio of the variances of the anchor in the two samples is between 0.80 and 1.5, then use the Tucker method (Kirk, 1971; Wichert, 1967). We couldn t find any rational explanation for these rules, especially for the cut-off values. Kolen and Brennan (1995, pp ), however, suggest choosing Levine when it is known that populations differ substantially and if there is also reason to believe that the forms are quite similar and choosing Tucker if the forms are suspected to differ, with the observation that if the populations [and 14

21 forms] are too dissimilar, then any equating is suspect and with the note that this ad hoc reasoning is by no means definitive. Hence, based on this information, one would have chosen the Levine equating function particularly for this example since the test forms are very carefully constructed to be parallel in this assessment program. We used the formulas (1), (4) (10) to compute the Tucker and Levine functions. We used () to compute the chain linear equating function. The equating functions, the SEEs, and the SEEDs are discrete functions of x. The three functions, shown in Figure 1, give relatively different results. The differences between the Tucker and Levine functions and the Tucker and chain functions are more than a half raw score point for the whole score range, which is a difference that matters. The difference between Levine function and the chain function is less than the size of a difference that matters for the whole score range. 15

22 EQUATING DIFFERENCE T-L T-C L-C X-SCORE Figure 1. The Tucker, Levine, chain linear functions. Study 1. NEAT design with an external anchor. The three SEEs are given in Figure. The shape of the SEEs is the usual one for linear equating functions, with lower values around the means and higher values for the extreme score ranges. The SEE for the Levine function seems to be larger than the SEE for the chain linear function and for the Tucker function, which has the smallest values almost overall on the score range (see Figure ). The SEEs for the three functions are very close to each other, though, and therefore, one could not choose a method solely based on these SEEs values. 16

23 SEE T L C X-SCORE Figure. The SEE T, SEE L, and SEE C. Study 1. NEAT design with an external anchor. We then used (4) (8) to compute the SEED for each pair of linear equating functions (Figure 3). From the results plotted in Figure 3, we might conclude that the accuracy of the difference between the Levine and chain linear functions is very high in the middle of the score range, but relatively low for the lower and upper score range. In contrast, the accuracy of the difference between the Tucker and chain functions and Tucker and Levine functions varies less across the score range, being relatively high in the middle of the score range. Since the accuracy of estimating the parameter π is the same for the three equating functions and the vector of the first derivatives of the linear function is also the same for the three equating functions, this plot reflects the differences in the pairs of the Method Functions (more exactly, the differences in the first derivatives of these functions) see also Corollary 1. 17

24 SEED T-L T-C L-C X-SCORE Figure 3. The standard error of equating differences for three equating functions. Study 1. NEAT design with an external anchor. Figures 4 6 plot the difference between two linear equating functions together with the corresponding ± SEED. In these three cases, the differences between the three functions (about half of a raw score point or more see also Figure 1) are statistically significant relative to the SEEDs. It appears that the Levine and the chain functions agree only at the very low end of the score range. As mentioned before, the SEEDs reflect the uncertainty in these differences that are due to the estimation of the parameters (the means, the variances, and the covariances in the two samples) as well as to the differences in the Method Functions. 18

25 SEED(T,L) T-L *SEED(T,L) -*SEED(T,L) X-SCORE Figure 4. The difference between Tucker and Levine together with a band of ±SEED T, L. Study 1. NEAT design with an external anchor. 19

26 SEED(T,C) -0.3 T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 5. The difference between Tucker and chain linear together with a band of ±SEED T,C. Study 1. NEAT design with an external anchor. 0

27 SEED(L,C) L-C *SEED(L,C) -*SEED(L,C) X-SCORE Figure 6. The difference between Levine and chain linear functions together with a band of ±SEED L, C. Study 1. NEAT design with an external anchor. In conclusion, we observe that the differences between the three equating functions are statistically significant. In other words, with the help of the SEED, we can distinguish between noise and real differences between the analyzed functions. Study The SEEDs are asymptotic results, so it is of interest to investigate how they vary with sample size. Sample sizes of 10,000 are relatively large, and therefore, the estimation of the parameters is relatively accurate. As a consequence, the ±SEED band will be very narrow. Study examined the following research questions: What is going to happen to the SEED when the sample sizes get smaller? 1

28 For which N will the ±SEED band be about half of a raw score point (a difference that matters)? For which N will the ±SEED band encompass the difference between the equating functions? More precisely, for which N will the SEED not be able to detect that the equating functions differ statistically? We resampled seven samples of sizes 5,000;,500; 1,700; 800; 400; 00; and 100 for each group of students from P (those who took (X, V)) and Q (those who took (Y, V)), respectively. These samples were independent random samples, drawn without replacement from the original N = 10,634 for (X, V) and M = 11,31 for (Y, V). Sorted, uniformly distributed random numbers between 0 and 1 (including 0 and 1) were generated in Microsoft Excel using the RAND function. The steps we used for sampling are as follows for each sample size within each population: 1) Assign a random number from the uniform distribution between 0 and 1 to each case (person) in the group. There are N cases in the first group. ) Sort these N random numbers. 3) The first N S cases (where N S is size for the new sample and N S is less or equal to N) are chosen to be included in the new sample. We repeated the same procedure for the second group. The summary statistics for X, Y, and V in P and Q in the new samples are given in Tables a and b.

29 Table a Summary Statistics for the Distributions of X, Y, V in P and V in Q, for the Samples With Different Sample Sizes, N S and M S N = 10,634 N S = 5,000 N S =,500 N S = 1,700 M = 11,31 M S = 5,000 M S =,500 M S = 1,700 µ XP XP µ VP VP XVP, ; µ YQ YQ µ VQ VQ YVQ, ; Moreover, we took care to preserve the same sign as that for the differences of the means in the two samples. We also took care to approximately preserve the same effect sizes (with respect to the difference in the ability in the two populations as measured by the anchor) across the samples (for example, we resampled a second set of samples of size 100 in order to preserve the same sign for the differences of the means in the two samples). It is important to note that, although the resampling was carefully carried out, by having smaller samples the parameter estimates will fluctuate around the values in the original samples. This measurement error will also have an effect on the computation of the equating functions, and their differences, respectively. 3

30 Table b Summary Statistics for the Distributions of X, Y, V in P and V in Q, for the Samples With Different Sample Sizes, N S and M S N S = 800 N S = 400 N S = 00 N S = 100 M S = 800 M S = 400 M S = 00 M S = 100 µ XP XP µ VP VP XVP, ; µ YQ YQ µ VQ VQ YVQ, ; Figures 7 to 13 plot only the differences between the Tucker and the chain linear equating together with ± SEED T,C. Study focuses on the SEED s behavior for small and medium sample sizes, and therefore, for this purpose it doesn t matter on which functions we focus. The results for the differences between Tucker and Levine functions are similar to the results for Tucker and chain linear, while the results for the differences between chain linear and Levine are as in Figure 6. Each figure illustrates one sample size, with N S = M S = 5,000,,500, 1,700, 800, 400, 00, and 100, respectively. We notice that when the sample sizes are small, the uncertainty related to computing the equating functions is large relative to the difference in the two functions (from in the original sample see Figures 3 and 5 to when N S = M S = 100 in Figure 7). Hence, with a sample size of 100 available, we would conclude that the differences in the two equating functions are not statistically significant. Moreover, the ± SEED T,C, in absolute value, is larger than a difference that matters. 4

31 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 7. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N = 100. NEAT design with an external anchor. For a sample size of 00, the differences between the Tucker and chain functions are statistically significant and the ± SEED T,C is about the size of a difference that matters (see Figure 8). However, at the lower and upper score range, the difference between the two equating functions is inside the band provided by the ± SEED T,C. One of the reasons is that the accuracy is lower at extremes of the score range. 5

32 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 8. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N = 00. NEAT design with an external anchor. For a sample size of 400, the differences between the Tucker and chain functions are statistically significant over most of the score range, and the SEED T,C is about the size of a difference that matters (see Figure 9). 6

33 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 9. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N = 400. NEAT design with an external anchor. For all the larger samples, the differences between the Tucker and chain functions are statistically significant over all of the score range (see Figures 10 13). 7

34 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 10. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N = 800. NEAT design with an external anchor. 8

35 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 11. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N = 1,700. NEAT design with an external anchor. 9

36 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 1. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N =,500. NEAT design with an external anchor. 30

37 SEED(T,C) T-C *SEED(T,C) -*SEED(T,C) X-SCORE Figure 13. The difference between Tucker and chain linear functions together with a band of ±SEED T, C. Study, N = 5,000. NEAT design with an external anchor With a sample size of 00 available, we conclude that the two equating functions significantly differ for most of the score range. For larger sample sizes, we notice that the accuracy increases (i.e., the SEED T,C in absolute values decreases), and one can conclude that the differences between the two functions are statistically significant. It follows that for this data set, a sample size of 00 seems to be enough for the SEED to detect that the two equating methods, Tucker and chain, differ statistically. The level of accuracy is slightly decreased for this small sample size. More studies are necessary to investigate the SEED behavior in small and medium samples. Given that in most of the practical equating situations the sample sizes are much larger, the SEED probably will detect whether the differences between the equating methods are significant. 31

38 In von Davier (003), several idealized conditions are described when the three methods will give the same results. However, in practical applications, each of these conditions holds more or less. In a real life situation, when plots like those from Figures 4 6 indicate that the differences between the methods are statistically significant, which of the methods should one choose? From a score-reporting point of view, it does matter which method one would choose in this example because the differences between the results from the Tucker method and the others do have an impact on the final results. (These differences are larger than half a raw score point for most of the raw-score range of X.) From Study 1, we can conclude that Tucker is far away from the other two equating methods and that the chain linear is in between Tucker and Levine ( Lin ( x;mf ( )) <Lin ( x;mf ( )) <Lin ( x;mf ( )) π π π ). Moreover, XY ; T T XY ; T CL XY ; T L all observed differences are statistically significant. We cannot make the decision about the final equating function using the SEED alone, if each of the equating methods relies on a different set of assumptions. We also cannot resolve the choice between the methods by directly checking their assumptions (T1 T, L1 L3, C1 C3) against the data, since these assumptions are not directly testable. In a practical situation, one will also investigate the issues related to the possible nonlinearity of the appropriate equating function. In addition, one should also investigate the SEE for each equating function. The equating results with a higher accuracy (smaller SEEs) should prevail. However, in Study 1, the differences in the SEEs were very small and therefore, it would be difficult to use them for making the decision. As mentioned before for this example where the two populations seem to be dissimilar, using the rules and discussion previously presented, one would choose the Levine equating function (when choosing between Tucker and Levine methods), since usually the test forms are very carefully constructed in this assessment program. Hence, the final decision would appear to be between the Levine and the chain functions. At this point, one s belief in the plausibility of each set of assumptions appears to be the sole basis left for making this important judgment (von Davier et al., 004, p. 194). Further research in this area is necessary. The advantages of the SEED are outlined in the next section. 3

39 Discussion This paper takes a new perspective on linear equating. It introduces a unified approach to linear equating in the NEAT design by developing a common parameterization that allows one to emphasize the similarities between different methods. Based on this common parameterization, we claim that there is only one definition of observed-score linear equating in the NEAT design, given in (1), which might take different forms under different assumptions. We use a new concept, the Method Function, to distinguish among the possible forms that a linear equating function in a NEAT design might take (in particular among the three equating methods investigated here Tucker, Levine, and chain linear equating). By using this approach, the SEE formula and concept also becomes unified, covering all of the particular equating functions. The new approach to linear equating provides a better understanding of equating in general as well as of the SEE. This view is provided here for the first time (to our knowledge). The new formula for the SEE makes a computer program more efficient. We also present a new tool, the standard error of equating difference (SEED), to investigate if the observed difference in equating functions is statistically significant. Although the SEED is an asymptotic result, it seems to be stable enough to detect the differences in a sample size of 00 for the data investigated here. Additional studies might be necessary to describe the behavior of the SEED for small and medium sample sizes for different data. The SEED provides an additional measure to consider when making decisions about the final equating function, especially for medium sample sizes. It is important to know if the observed differences between two equating functions are statistically significant or they reflect only random errors. This issue was extensively investigated in empirical studies, and as Harris and Crouse (1993, p. 19) conclude: Perhaps the most common process followed in conducting an equating study is to apply a series of equating methods to a particular situation. Usually all that can be concluded from such a comparison is whether the methods appear to be providing similar or dissimilar results, and even that cannot be determined with any accuracy, because one generally does not have a baseline by which to judge if the differences between results are simply the result of random error, or something else. 33

40 The SEED is exactly the answer to the second part of Harris and Crouse s remark: The SEED can tell if the observed differences are the result of random error or not. While it does not solve the problem of how to decide between different equating functions, it is a step forward in providing more insight and information that one can use when making this decision. Harris and Crouse (1993) reviewed all criteria and methods that researchers had developed for improving this decisional process up to Three other methods can be considered: 1. Investigating how sensitive each of the equating functions is to the population invariance assumption (see Dorans & Holland, 000; von Davier et al, 003). The method introduced in von Davier et al. (003), though promising, needs additional research.. Carrying out a score equity analysis proposed in Dorans (003). This is also an approach to the study of population invariance, but it focuses on different issues: specifying the number of subpopulations that should be investigated, checking if the subpopulation score distributions are similar, computing the standardized difference between the means in the important subpopulations, and using the Dorans and Holland measure (000) to investigate the population invariance of the equating function. 3. Comparing the first several moments of the distribution obtained through equating with those of the distribution of the old form (the targeted distribution see von Davier, et al., 004, chapter 4). It is also worth noting that a similar approach as outlined here (with general formulas for the SEE and SEED) is being developed to investigate the differences between linear and nonlinear equating functions in the framework of the kernel method of test equating (see von Davier et al., 004). A similar SEED formula is not feasible for the classical equipercentile equating (which uses a linear interpolation as a continuization procedure) because the resulting equating function is not continuously differentiable at the extreme of the linear segments (and therefore, the delta method cannot be applied). Bootstrap SEED might be conceived for this situation, which might be a very interesting issue for further research. 34

41 References Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service. (Reprinted from Educational measurement, nd ed., pp , by R. L. Thorndike, Ed., 1971, Washington, DC: American Council on Education.) Braun, H. I., & Holland, P. W. (198). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9 49). New York: Academic. von Davier, A. A. (001). Testing unconfoundedness in regression models with normally distributed variables. Aachen: Shaker Verlag. von Davier, A. A. (003). Notes on linear equating methods for the Non-Equivalent Groups design (ETS RR-03-4). Princeton, NJ: Educational Testing Service. von Davier, A. A., Holland, P. W., & Thayer, D. T. (004). The kernel method of test equating. New York: Springer Verlag. von Davier, A. A., Holland, P. W. & Thayer, D. T. (003). Population invariance and chain versus post-stratification methods for equating and test linking. In N. Dorans (Ed.), Population invariance of score linking: Theory and applications to Advanced Placement Program Examinations (ETS RR-03-7). Princeton, NJ: Educational Testing Service. von Davier, A. A., Holland, P. W. & Thayer, D. T. (in press). The chain and post-stratification methods for observed-score equating: Their relationship to population invariance. Journal of Educational Measurement. Dorans, N. J. (003, May 16). Score equity analysis. Paper presented at the Ledyard R. Tucker Psychometric Workshop, Educational Testing Service, Princeton, NJ. Dorans, N. J., & Holland, P. W. (000). Population invariance and equitability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37, Hanson, B. A., Zeng, L., & Kolen, M. J. (1993). Standard errors of Levine linear equating, Applied Psychological Measurement, 17, Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), Holland, P. W., King, B. F., & Thayer, D. T. (1989). The standard error of equating for the kernel method of equating score distributions (ETS PSRTR-89-83, ETS RR-89-06). Princeton, NJ: Educational Testing Service. 35

Research on Standard Errors of Equating Differences

Research on Standard Errors of Equating Differences Research Report Research on Standard Errors of Equating Differences Tim Moses Wenmin Zhang November 2010 ETS RR-10-25 Listening. Learning. Leading. Research on Standard Errors of Equating Differences Tim

More information

Haiwen (Henry) Chen and Paul Holland 1 ETS, Princeton, New Jersey

Haiwen (Henry) Chen and Paul Holland 1 ETS, Princeton, New Jersey Research Report Construction of Chained True Score Equipercentile Equatings Under the Kernel Equating (KE) Framework and Their Relationship to Levine True Score Equating Haiwen (Henry) Chen Paul Holland

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

Population Invariance of Score Linking: Theory and Applications to Advanced Placement Program Examinations

Population Invariance of Score Linking: Theory and Applications to Advanced Placement Program Examinations Research Report Population Invariance of Score Linking: Theory and Applications to Advanced Placement Program Examinations Neil J. Dorans, Editor Research & Development October 2003 RR-03-27 Population

More information

Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design

Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design Research Report Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design Paul W. Holland Alina A. von Davier Sandip Sinharay Ning Han Research & Development

More information

Chapter 1 A Statistical Perspective on Equating Test Scores

Chapter 1 A Statistical Perspective on Equating Test Scores Chapter 1 A Statistical Perspective on Equating Test Scores Alina A. von Davier The fact that statistical methods of inference play so slight a role... reflect[s] the lack of influence modern statistical

More information

A Note on the Choice of an Anchor Test in Equating

A Note on the Choice of an Anchor Test in Equating Research Report ETS RR 12-14 A Note on the Choice of an Anchor Test in Equating Sandip Sinharay Shelby Haberman Paul Holland Charles Lewis September 2012 ETS Research Report Series EIGNOR EXECUTIVE EDITOR

More information

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating Tianyou Wang and Michael J. Kolen American College Testing A quadratic curve test equating method for equating

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 37 Effects of the Number of Common Items on Equating Precision and Estimates of the Lower Bound to the Number of Common

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 31 Assessing Equating Results Based on First-order and Second-order Equity Eunjung Lee, Won-Chan Lee, Robert L. Brennan

More information

Effect of Repeaters on Score Equating in a Large Scale Licensure Test. Sooyeon Kim Michael E. Walker ETS, Princeton, NJ

Effect of Repeaters on Score Equating in a Large Scale Licensure Test. Sooyeon Kim Michael E. Walker ETS, Princeton, NJ Effect of Repeaters on Score Equating in a Large Scale Licensure Test Sooyeon Kim Michael E. Walker ETS, Princeton, NJ Paper presented at the annual meeting of the American Educational Research Association

More information

Equating of Subscores and Weighted Averages Under the NEAT Design

Equating of Subscores and Weighted Averages Under the NEAT Design Research Report ETS RR 11-01 Equating of Subscores and Weighted Averages Under the NEAT Design Sandip Sinharay Shelby Haberman January 2011 Equating of Subscores and Weighted Averages Under the NEAT Design

More information

Equating Subscores Using Total Scaled Scores as an Anchor

Equating Subscores Using Total Scaled Scores as an Anchor Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and

More information

Choice of Anchor Test in Equating

Choice of Anchor Test in Equating Research Report Choice of Anchor Test in Equating Sandip Sinharay Paul Holland Research & Development November 2006 RR-06-35 Choice of Anchor Test in Equating Sandip Sinharay and Paul Holland ETS, Princeton,

More information

A Comparison of Bivariate Smoothing Methods in Common-Item Equipercentile Equating

A Comparison of Bivariate Smoothing Methods in Common-Item Equipercentile Equating A Comparison of Bivariate Smoothing Methods in Common-Item Equipercentile Equating Bradley A. Hanson American College Testing The effectiveness of smoothing the bivariate distributions of common and noncommon

More information

Statistical Equating Methods

Statistical Equating Methods Statistical Equating Methods Anthony Albano January 7, 2011 Abstract The R package equate (Albano, 2011) contains functions for non-irt equating under random groups and nonequivalent groups designs. This

More information

SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01

SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01 Paper CC-01 Smoothing Scaled Score Distributions from a Standardized Test using PROC GENMOD Jonathan Steinberg, Educational Testing Service, Princeton, NJ Tim Moses, Educational Testing Service, Princeton,

More information

Linear Equating Models for the Common-item Nonequivalent-Populations Design Michael J. Kolen and Robert L. Brennan American College Testing Program

Linear Equating Models for the Common-item Nonequivalent-Populations Design Michael J. Kolen and Robert L. Brennan American College Testing Program Linear Equating Models for the Common-item Nonequivalent-Populations Design Michael J. Kolen Robert L. Brennan American College Testing Program The Tucker Levine equally reliable linear meth- in the common-item

More information

Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design

Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design University of Iowa Iowa Research Online Theses and Dissertations 27 Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent

More information

Nonequivalent-Populations Design David J. Woodruff American College Testing Program

Nonequivalent-Populations Design David J. Woodruff American College Testing Program A Comparison of Three Linear Equating Methods for the Common-Item Nonequivalent-Populations Design David J. Woodruff American College Testing Program Three linear equating methods for the common-item nonequivalent-populations

More information

Shelby J. Haberman. Hongwen Guo. Jinghua Liu. Neil J. Dorans. ETS, Princeton, NJ

Shelby J. Haberman. Hongwen Guo. Jinghua Liu. Neil J. Dorans. ETS, Princeton, NJ Consistency of SAT I: Reasoning Test Score Conversions Shelby J. Haberman Hongwen Guo Jinghua Liu Neil J. Dorans ETS, Princeton, NJ Paper presented at the annual meeting of the American Educational Research

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Observed-Score "Equatings"

Observed-Score Equatings Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" Frederic M. Lord and Marilyn S. Wingersky Educational Testing Service Two methods of equating tests are compared, one using true

More information

GODFREY, KELLY ELIZABETH, Ph.D. A Comparison of Kernel Equating and IRT True Score Equating Methods. (2007) Directed by Dr. Terry A. Ackerman. 181 pp.

GODFREY, KELLY ELIZABETH, Ph.D. A Comparison of Kernel Equating and IRT True Score Equating Methods. (2007) Directed by Dr. Terry A. Ackerman. 181 pp. GODFREY, KELLY ELIZABETH, Ph.D. A Comparison of Kernel Equating and IRT True Score Equating Methods. (7) Directed by Dr. Terry A. Ackerman. 8 pp. This two-part study investigates ) the impact of loglinear

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Use of Continuous Exponential Families to Link Forms via Anchor Tests

Use of Continuous Exponential Families to Link Forms via Anchor Tests Research Report ETS RR 11-11 Use of Continuous Exponential Families to Link Forms via Anchor Tests Shelby J. Haberman Duanli Yan April 11 Use of Continuous Exponential Families to Link Forms via Anchor

More information

equate: An R Package for Observed-Score Linking and Equating

equate: An R Package for Observed-Score Linking and Equating equate: An R Package for Observed-Score Linking and Equating Anthony D. Albano University of Nebraska-Lincoln Abstract The R package equate (Albano 2016) contains functions for observed-score linking and

More information

Use of e-rater in Scoring of the TOEFL ibt Writing Test

Use of e-rater in Scoring of the TOEFL ibt Writing Test Research Report ETS RR 11-25 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman June 2011 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman ETS, Princeton,

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

equate: An R Package for Observed-Score Linking and Equating

equate: An R Package for Observed-Score Linking and Equating equate: An R Package for Observed-Score Linking and Equating Anthony D. Albano University of Nebraska-Lincoln Abstract The R package equate (Albano 2014) contains functions for observed-score linking and

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

arxiv: v1 [stat.co] 26 May 2009

arxiv: v1 [stat.co] 26 May 2009 MAXIMUM LIKELIHOOD ESTIMATION FOR MARKOV CHAINS arxiv:0905.4131v1 [stat.co] 6 May 009 IULIANA TEODORESCU Abstract. A new approach for optimal estimation of Markov chains with sparse transition matrices

More information

Raffaela Wolf, MS, MA. Bachelor of Science, University of Maine, Master of Science, Robert Morris University, 2008

Raffaela Wolf, MS, MA. Bachelor of Science, University of Maine, Master of Science, Robert Morris University, 2008 Assessing the Impact of Characteristics of the Test, Common-items, and Examinees on the Preservation of Equity Properties in Mixed-format Test Equating by Raffaela Wolf, MS, MA Bachelor of Science, University

More information

Using statistical equating for standard maintaining in GCSEs and A levels

Using statistical equating for standard maintaining in GCSEs and A levels Using statistical equating for standard maintaining in GCSEs and A levels Tom Bramley & Carmen Vidal Rodeiro Cambridge Assessment Research Report 22 nd January 2014 Author contact details: Tom Bramley

More information

Package equate. February 15, 2013

Package equate. February 15, 2013 Package equate February 15, 2013 Version 1.1-4 Date 2011-8-23 Title Statistical Methods for Test Score Equating Author Anthony Albano Maintainer Anthony Albano

More information

Quadratics and Other Polynomials

Quadratics and Other Polynomials Algebra 2, Quarter 2, Unit 2.1 Quadratics and Other Polynomials Overview Number of instructional days: 15 (1 day = 45 60 minutes) Content to be learned Know and apply the Fundamental Theorem of Algebra

More information

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Conditional SEMs from OLS, 1 Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Mark R. Raymond and Irina Grabovsky National Board of Medical Examiners

More information

1 Measurement Uncertainties

1 Measurement Uncertainties 1 Measurement Uncertainties (Adapted stolen, really from work by Amin Jaziri) 1.1 Introduction No measurement can be perfectly certain. No measuring device is infinitely sensitive or infinitely precise.

More information

A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example

A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example Robert L. Brennan CASMA University of Iowa June 10, 2012 On May 3, 2012, the author made a PowerPoint presentation

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences Random Variables Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Linking the Smarter Balanced Assessments to NWEA MAP Tests

Linking the Smarter Balanced Assessments to NWEA MAP Tests Linking the Smarter Balanced Assessments to NWEA MAP Tests Introduction Concordance tables have been used for decades to relate scores on different tests measuring similar but distinct constructs. These

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D.

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. David Kaplan Department of Educational Psychology The General Theme

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Section 4. Test-Level Analyses

Section 4. Test-Level Analyses Section 4. Test-Level Analyses Test-level analyses include demographic distributions, reliability analyses, summary statistics, and decision consistency and accuracy. Demographic Distributions All eligible

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee Module - 04 Lecture - 05 Sales Forecasting - II A very warm welcome

More information

Chapter 0 of Calculus ++, Differential calculus with several variables

Chapter 0 of Calculus ++, Differential calculus with several variables Chapter of Calculus ++, Differential calculus with several variables Background material by Eric A Carlen Professor of Mathematics Georgia Tech Spring 6 c 6 by the author, all rights reserved - Table of

More information

The Effect of Differential Item Functioning on Population Invariance of Item Response Theory True Score Equating

The Effect of Differential Item Functioning on Population Invariance of Item Response Theory True Score Equating University of Miami Scholarly Repository Open Access Dissertations Electronic Theses and Dissertations 2012-04-12 The Effect of Differential Item Functioning on Population Invariance of Item Response Theory

More information

Looking Under the EA Hood with Price s Equation

Looking Under the EA Hood with Price s Equation Looking Under the EA Hood with Price s Equation Jeffrey K. Bassett 1, Mitchell A. Potter 2, and Kenneth A. De Jong 1 1 George Mason University, Fairfax, VA 22030 {jbassett, kdejong}@cs.gmu.edu 2 Naval

More information

Reporting Subscores: A Survey

Reporting Subscores: A Survey Research Memorandum Reporting s: A Survey Sandip Sinharay Shelby J. Haberman December 2008 ETS RM-08-18 Listening. Learning. Leading. Reporting s: A Survey Sandip Sinharay and Shelby J. Haberman ETS, Princeton,

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Measurements and Data Analysis

Measurements and Data Analysis Measurements and Data Analysis 1 Introduction The central point in experimental physical science is the measurement of physical quantities. Experience has shown that all measurements, no matter how carefully

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools Pre-Algebra 8 Pre-Algebra 8 BOE Approved 05/21/2013 1 PRE-ALGEBRA 8 Critical Areas of Focus In the Pre-Algebra 8 course, instructional time should focus on three critical

More information

Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore

Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore (Refer Slide Time: 00:15) Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore Lecture No. # 03 Mathematical Preliminaries:

More information

Bridges in Mathematics & Number Corner Second Edition, Grade 2 State of Louisiana Standards Correlations

Bridges in Mathematics & Number Corner Second Edition, Grade 2 State of Louisiana Standards Correlations Bridges in Mathematics & Number Corner Second Edition, Grade 2 State of Louisiana Standards Correlations Grade 2 Overview Operations & Algebraic Thinking A. Represent and solve problems involving addition

More information

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data Jeff Dominitz RAND and Charles F. Manski Department of Economics and Institute for Policy Research, Northwestern

More information

Operation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Operation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Operation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture - 3 Forecasting Linear Models, Regression, Holt s, Seasonality

More information

Uncertainty and Graphical Analysis

Uncertainty and Graphical Analysis Uncertainty and Graphical Analysis Introduction Two measures of the quality of an experimental result are its accuracy and its precision. An accurate result is consistent with some ideal, true value, perhaps

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Methods for the Comparability of Scores

Methods for the Comparability of Scores Assessment and Development of new Statistical Methods for the Comparability of Scores 1 Pontificia Universidad Católica de Chile 2 Laboratorio Interdisciplinario de Estadística Social Advisor: Jorge González

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

STA Module 10 Comparing Two Proportions

STA Module 10 Comparing Two Proportions STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare

More information

SCIENTIFIC INQUIRY AND CONNECTIONS. Recognize questions and hypotheses that can be investigated according to the criteria and methods of science

SCIENTIFIC INQUIRY AND CONNECTIONS. Recognize questions and hypotheses that can be investigated according to the criteria and methods of science SUBAREA I. COMPETENCY 1.0 SCIENTIFIC INQUIRY AND CONNECTIONS UNDERSTAND THE PRINCIPLES AND PROCESSES OF SCIENTIFIC INQUIRY AND CONDUCTING SCIENTIFIC INVESTIGATIONS SKILL 1.1 Recognize questions and hypotheses

More information

ROBUST SCALE TRNASFORMATION METHODS IN IRT TRUE SCORE EQUATING UNDER COMMON-ITEM NONEQUIVALENT GROUPS DESIGN

ROBUST SCALE TRNASFORMATION METHODS IN IRT TRUE SCORE EQUATING UNDER COMMON-ITEM NONEQUIVALENT GROUPS DESIGN ROBUST SCALE TRNASFORMATION METHODS IN IRT TRUE SCORE EQUATING UNDER COMMON-ITEM NONEQUIVALENT GROUPS DESIGN A Dissertation Presented to the Faculty of the Department of Educational, School and Counseling

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

A Better Way to Do R&R Studies

A Better Way to Do R&R Studies The Evaluating the Measurement Process Approach Last month s column looked at how to fix some of the Problems with Gauge R&R Studies. This month I will show you how to learn more from your gauge R&R data

More information

Post-exam 2 practice questions 18.05, Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014 Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,

More information

Group Dependence of Some Reliability

Group Dependence of Some Reliability Group Dependence of Some Reliability Indices for astery Tests D. R. Divgi Syracuse University Reliability indices for mastery tests depend not only on true-score variance but also on mean and cutoff scores.

More information

1 Measurement Uncertainties

1 Measurement Uncertainties 1 Measurement Uncertainties (Adapted stolen, really from work by Amin Jaziri) 1.1 Introduction No measurement can be perfectly certain. No measuring device is infinitely sensitive or infinitely precise.

More information

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY Y -3 - -1 0 1 3 X Y -10-5 0 5 10 X Measurement Theory t & X 1 X X 3 X k Reliability e 1 e e 3 e k 1 The Big Picture Measurement error makes it difficult to identify the true patterns of relationships between

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Stochastic Processes

Stochastic Processes qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

More information

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

1. Create a scatterplot of this data. 2. Find the correlation coefficient. How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create

More information

Introduction to Uncertainty and Treatment of Data

Introduction to Uncertainty and Treatment of Data Introduction to Uncertainty and Treatment of Data Introduction The purpose of this experiment is to familiarize the student with some of the instruments used in making measurements in the physics laboratory,

More information

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Uncertainty due to Finite Resolution Measurements

Uncertainty due to Finite Resolution Measurements Uncertainty due to Finite Resolution Measurements S.D. Phillips, B. Tolman, T.W. Estler National Institute of Standards and Technology Gaithersburg, MD 899 Steven.Phillips@NIST.gov Abstract We investigate

More information

Chapter 3. Introduction to Linear Correlation and Regression Part 3

Chapter 3. Introduction to Linear Correlation and Regression Part 3 Tuesday, December 12, 2000 Ch3 Intro Correlation Pt 3 Page: 1 Richard Lowry, 1999-2000 All rights reserved. Chapter 3. Introduction to Linear Correlation and Regression Part 3 Regression The appearance

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course 2 Washington, DC. June 27, 2018

More information

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS chapter MORE MATRIX ALGEBRA GOALS In Chapter we studied matrix operations and the algebra of sets and logic. We also made note of the strong resemblance of matrix algebra to elementary algebra. The reader

More information

Measurement Error in Nonparametric Item Response Curve Estimation

Measurement Error in Nonparametric Item Response Curve Estimation Research Report ETS RR 11-28 Measurement Error in Nonparametric Item Response Curve Estimation Hongwen Guo Sandip Sinharay June 2011 Measurement Error in Nonparametric Item Response Curve Estimation Hongwen

More information

Relations and Functions

Relations and Functions Algebra 1, Quarter 2, Unit 2.1 Relations and Functions Overview Number of instructional days: 10 (2 assessments) (1 day = 45 60 minutes) Content to be learned Demonstrate conceptual understanding of linear

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

arxiv: v1 [math.gm] 23 Dec 2018

arxiv: v1 [math.gm] 23 Dec 2018 A Peculiarity in the Parity of Primes arxiv:1812.11841v1 [math.gm] 23 Dec 2018 Debayan Gupta MIT debayan@mit.edu January 1, 2019 Abstract Mayuri Sridhar MIT mayuri@mit.edu We create a simple test for distinguishing

More information

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS Donna Mohr and Yong Xu University of North Florida Authors Note Parts of this work were incorporated in Yong Xu s Masters Thesis

More information