AMERICAN INSTITUTES FOR RESEARCH

Size: px
Start display at page:

Download "AMERICAN INSTITUTES FOR RESEARCH"

Transcription

1 AMERICAN INSTITUTES FOR RESEARCH LINKING RASCH SCALES ACROSS GRADES IN CLUSTERED SAMPLES Jon Cohen, Mary Seburn, Tamas Antal, and Matthew Gushta American Institutes for Research May 23, THOMAS JEFFERSON STREET, NW WASHINGTON, DC FAX WEBSITE

2 . Introduction Researchers and practioners share an interest in measurin student learnin over time. A sinle metric for student proficiency across rades would provide the yardstick by which student learnin could be monitored and teacher, school and proram effectiveness could be measured. Interest in, and controversy around, such vertical scales has endured for decades (Cronbach & Furby, 970; Linn & Slinde, 977; Roosa, Brandt, & Zimowski, 982). Much of the controversy arises from the fact that curricula differ across rades and the nature of the construct bein measured by such a vertical scale may vary alon the dimension. In practice, vertical scales suffer from instability it is common to find that different methods result in different inferences about rowth over time (Harris, 99; Kolen, 98; Skas & Lissitz, 988). This study is desined to compare two different vertical linkin methods in terms of the accuracy and precision of the estimators and the availability of adequate standard error estimators for realistic data. Most often, vertical linkin is accomplished throuh a common-item, nonequivalent roups desin (Muraki, Hombo, & Lee, 2000; Kolen & Brennan, 2004). Under these desins, tests for adjacent rades include a set of common items that allows the scales to be linked across rades. With these data in hand, psychometricians often use one of two methods based on Item Response Theory (IRT, Rasch, 960; Lord, 980; Hambleton & Swaminathan, 985) to link the scales across rades: Joint, or concurrent, calibration, in which the items from all of the overlappin forms of the test are calibrated toether, resultin in a sinle cross-rade scale; Chain linkin, in which each rade is calibrated separately, and the resultin item parameters of the items shared across rades are fixed to common values or estimates of them are placed on a common scale. The literature on the choice between these two eneral approaches is inconclusive at best. Some studies have found that separate estimation improves the fit between the data and the model (e.., Karakee, Lewis, Hoskens, & Yao, 2003; Kim & Cohen, 998). Separate estimation, however, doubles the number of parameters estimated, reducin the derees of freedom and necessarily improvin the fit to the particular dataset. These studies did not evaluate whether the improved fit exceeded the improvement that would arise naturally from the reduction in the derees of freedom. Peterson, Cook and Stockin (983) report that concurrent calibration produced more stable estimates, as did Hanson and Béuin (2002) under equivalent roup desins. However, under nonequivalent roup desins (as found in vertical linkin studies), Hanson and Béuin report better estimates from separate estimation. They also find that different software produces different results and that the estimates are sensitive to ancillary specifications, such as prior distributions imposed to constrain parameter estimates. American Institutes for Research

3 Kolen and Brennan (2004) suest that when the IRT model holds, concurrent calibration should produce more stable results because it uses all the available information to estimate the parameters. However, they prefer separate estimation when assumptions are violated. Taken toether, this research suests that separate calibration may provide a better fit to the data, but may do so by matchin particular data sets and reduced eneralizability. Here, we compare the methods when the data is multidimensional and arises from a complex sample. 2. A Model of Item Response This study beins by formalizin the processes presumed to enerate student responses to items in the real world. Item response theory provides a convenient foundation on which to build this model. We bein with a simple exposition of item response models and then extend that model: to reflect the fact that students are oranized into somewhat homoenous schools and classrooms; to capture the impact of chanin curricula over time. 2 American Institutes for Research

4 A Basic Model of Item Response Classical measurement theory beins with a linear model y ij = θ i + eij, where y ij is person i s response to a continuous item or test j, and e ij is the individual, item-specific measurement error of the response. if yij > b j If we have a binary measure in place of y ij we observe that zij =, where b j is a 0 otherwise threshold alon the true-score dimension. With this, the relationship between ability and item response can be stated as a probability p( z ij = θ ) = p( θ + e i i = p( e ij ij > b ) < θ b ). i j j () This relationship forms the basis of most item response theory (IRT) models. For example, if e is distributed standard loistic, we have the familiar Rasch model p( z ij θ. (2) + e = i ) = ( θi b j ) This development explicitly reconizes item response as a function of the error term e ij. Extension to Clustered Data In the real world, students are oranized into schools, and the averae proficiency of students varies across schools and classrooms. In addition, the instruction that students receive also varies by school or classroom. Both of these forces can have an impact on item response. Consider a hih-achievin school the averae proficiency (θ ) will be relatively hih, resultin in relatively hih probabilities of correct responses on the items. More subtly, consider a fourth-rade class in which the teacher enjoys teachin the multiplication of fractions, so she teaches it early and often. Her students will likely perform well on this type of item relative to other mathematics items. Therefore, we should expect to observe different patterns of performance from other teachers and other schools. The first process (the clusterin of students of similar ability within schools or classrooms) suests that the structure of θ is clustered. For example, θ ik = w k + wik where i indexes examinees and k indexes the schools or classrooms that make up the clusters. The second process difference in classroom curriculum, instruction or timin may show up as a clusterin of the measurement error, e = u + u. (3) ijk jk ijk 3 American Institutes for Research

5 Extension to Vertical Scales Vertically linked scales enerally rest on the assumption that task demands in subsequent rades are simply harder versions of the earlier rades tasks. Althouh curricula do tend to be vertically articulated, new skills and knowlede are introduced and tauht at later rade levels, whereas other skills are mastered and no loner tauht or tested. Test blueprints typically reflect these shifts, with the resultin vertical scales measurin a slihtly different trait at each rade level. It is reasonable to consider the vertical scale to be the aspect of the curriculum that chanes only in difficulty across rades. This trait will be correlated with the within-rade scales, but it is not identical to it. Denotin the within-rade trait θ and the vertical trait ψ, ψ θ ik ik = a = β ψ + v ik k + v + w k ik + w ik. (4) The mean of the vertical trait increases with rade (subscripted ). A student s proficiency on the vertical trait at a point in time would reflect both school (or classroom) and student effects. The rade-specific trait reflects its correlation with the vertical trait, as well as additional schoolspecific effects. Implications for Vertical Linkin This model of item response suests several factors that will likely influence the stability of linkaes across rades, iven a fixed number of linkin items and a fixed sample size: The correlation between the rade-specific and vertical traits. The unidimensional linkin model is misspecified when two different traits contribute to eneratin the responses to the items. If these traits are perfectly correlated, they function as a sinle trait, eliminatin this source of error. As this correlation decreases, the resultin error increases. θ The manitude of the intra-cluster correlation found in and Ψ. When units within a cluster are more similar to each other than they are to units in other clusters, they provide less information than an equivalent number of units from a random sample. The impact that samplin desin has on the precision of estimates is described by the desin effect, which is often summarized as a ratio of the actual samplin variance to the samplin variance that would result from a simple random sample of the same size (i.e., the ratio of the actual standard error squared to the standard error squared from a simple random sample; Kish, 965). The manitude of the intra-cluster correlation in item-specific responses. As above, positive intra-cluster correlation will increase the desin effect. 4 American Institutes for Research

6 In eneral, these forces constitute violations of the assumptions underlyin the basic IRT models used to calibrate and link most tests. When faced with such violations, statisticians typically take one of two strateies: develop more complex models that more accurately model the processes of interest or develop methods that are robust to such violations. A number of researchers have developed structural IRT models that capture more of the real-world complexities (e.., Patz & Junker, 999; Kamata, 998, 200; Skrondal & Rabe-Hesketh, 2003; Rabe-Hesketh, Skrondal, & Pickles, 2004; Glas, 2005). These models often require additional distributional assumptions, possibly reducin their robustness. Here, we take an alternative approach common in the samplin literature, usin the more familiar vertical-linkin point estimators and constructin robust confidence intervals around them (e.., Cochran, 977; Kish, 965; Sarndal, Swensseon, & Wretman, 992). 3. Two Models for Creatin Vertical Scales and Evaluatin Their Precision This section describes the technical details of the joint calibration and chain-linkin procedures and the proposed standard error estimators for each. The calibration procedures can be successfully implemented in a variety of ways, throuh conditional maximum likelihood (CML, as in Conquest or OPLM), marinal maximum likelihood (MML) or nonparametric marinal maximum likelihood (NPMML, as in Bilo, Parscale, or Loismo), or, with a sufficient number of items, joint maximum likelihood (JML, as in Winsteps). This study uses NPMML procedures, which have been shown to share the optimal properties of CML estimators (De Leeuw & Verhelst, 986) and for which robust variance estimators are available (Cohen, Chan, Jian, & Seburn, 2005). The Joint Calibration Procedure The joint calibration procedure develops the common vertical scale in a sinle step by calibratin all items from all rades simultaneously. Given G rades to link, the lo-likelihood has the followin form: G lo L= l, = 3 where l is the rade-specific (marinal) lo-likelihood iven by N l = lo L( i θ, ) f( θ)dθ i= z β, (5) where N is the number of students in rade, θ is the ability, f is its rade-dependent density function, β = ( β, K, βj) is the collection of item parameters, and zi = ( zi, K, zij ) is the row of the full response matrix Z correspondin to student i in rade. Usin the independence assumption, we see that the likelihood of row i is 5 American Institutes for Research

7 J L( z θ, β ) = p( z θ, β ). i ij j j= Note that Equation 5 contains an explicit model of the population distribution within a rade. This decomposition of the likelihood and the population distribution provide the framework for vertical linkin. The connections across rades are made by the sets of common items assined to students in adjacent rades. The NPMML proceeds by replacin f with an empirical vector of normalized weihts p = ( p, K, p ) on a prespecified collection of population parameters (quadrature points) θ = ( θ, K, θ ), resultin in the followin approximation of the rade-specific lo-likelihood: N l l = lo L( zi θq, β) pq, i= q= with the constraint that pq =, for all. (6) q= In order to identify the model, it is sufficient to fix the mean proficiency within a sinle rade. For simplicity, let us fix the lowest rade proficiency to 0; that is, let θ qpq = 0. (7) q= If we assume that the location of the quadrature points are fixed, rather than estimated, the task becomes findin the conditional maximum of l ( Z, β, p, K, pg) = l G = 3 subject to the constraints in Equations (6) and (7). Because we want the conditional maximum place, we use Larane multipliers to redefine the likelihood function l to include the constraints with the new estimable parameters: l( % Z, β, p, K, p, µλ,, K, λ )= l% + µ θp, 3 G 3 G q q3 = 3 q= G Note that J, the number of items, may also depend on rade. However, to avoid cumbersome formulae, we suppress any notation indicatin this dependence. 6 American Institutes for Research

8 where l% = l + λ pq. q= We use an extension of Bock and Aiken s (98) EM alorithm to implement the NPMML estimation (see Cohen et al., 2005). This calibration yields rade-specific population distributions. From these we can readily obtain an estimate of the population moments; for example, the first moment (in the rades in which it is not fixed) is iven by µ = θ p. Standard Error of Parameters q= q q In a simple random sample, the likelihood of the data is the product of the individual likelihoods across observations, estimated by takin los and summin across observations. When the observations are correlated, as in a clustered sample, the function is no loner a true likelihood function the joint likelihood of the observations is no loner the product of the likelihoods of each observation because they nelect the covariance amon observations. Psychometricians continue to use estimates based on this likelihood function, even thouh it does not accurately model the real-world process of interest. The score function constitutes an estimatin equation in the sense of Godambe (960) and Godambe and Thompson (984), and the parameters of that function continue to hold pramatic interest in operational testin prorams. The inverse of the would-be information matrix, however, no loner provides an acceptable approximation of the variance of those estimates (Binder, 983; Godambe & Thompson, 984). For that reason, we use a Taylor-series approximation of the standard error, based upon the work of Binder (983). To develop the approximate variance estimator, we bein by reparameterizin the likelihood function. There are, in eneral, two equivalent approaches to estimatin constrained maximum likelihood models. The first, which we mention above, is based on the constrained likelihood (by introducin Larane multipliers). The second, based on a reduced likelihood function, is obtained by eliminatin redundant parameters (Mislevy, 984). Followin Mislevy, we reparameterize to eliminate redundant parameters, usin the information from the constraints to calculate the eliminated parameters in the full model. More precisely, we reard the last two population mass parameters p and p as functions of the previous 2 (because there are two constraints): p aθ b = θ θ and p aθ b =, θ θ 7 American Institutes for Research

9 where a 2 2 = pq and b q= q= = θ p. q q Let us define the weihted score function as the first derivative of the marinal lo-likelihood with respect to the reduced set of parameters of the model red γ = (, ) = (, ( p, K, p 3, p 2) ) β p β, K nk red w ( γ) = ( β, p ) = γ red = γ klo ( zi θq, β) q k= i= q= W W D l D w L p, where wk ( k =, K, K) is the samplin weiht associated with cluster (or PSU, primary samplin unit) k, and n k is the size of cluster k (aain, for the sake of transparency we inore stratification). In our context, the equation W ( γ) = 0, ( γ =?) (8) provides an estimatin equation in the sense of Godambe and Thompson (984) by which we may obtain consistent estimates of the finite population variances usin the formulae of Binder (983). To see this, let us assume that γˆ is the solution of the estimatin equation (8) in the sample and γ is the solution based on the full finite population or the set of all possible populations of interest. Then in first order we have W ( γ ) 0 = W ( ˆ) γ = W ( γ ) + ( ˆ γ γ ) + R. γ From this we obtain and W ( γ ) ˆ γ γ = γ W ( γ ) Var( ˆ) γ = ( ˆ γ γ )( ˆ γ γ ) T W ( γ ) = γ W ( γ ) W ( γ ) T W ( γ ) γ. 8 American Institutes for Research

10 Introducin Ω (γ ) as a variance of W ( γ ) across observations and takin the expectation value over γ, we obtain the covariance matrix of the reduced set of parameters: γ W ( γ ) W ( γ ) red = Var( ˆ) γ = Ω( γ ) γ = ˆ γ. γ γ To estimate Ω ˆ ( ˆ γ ) of Ω ( γ ), we use the stratified, between-psu weihted estimator, which is iven by K ˆ K T Ω ( ˆ γ ) = ( k )( k ), K k = where nk = D w lo L( θ, ) p z β and k γ k i q q i= q= γ= ˆ γ K = k. K = k Standard Error of Moments When creatin a vertical scale, we are interested in the estimates of the first moment of the population distribution (fixin this moment in one of the rades to zero). The previous section p yields the reduced covariance matrix Σ of the population mass parameters as a submatrix of γ Σ red : Σ Σ γ. β 2 Σ red = p Σ 2 Σred red p To obtain the covariance matrix Σ of the full set of population mass parameters, we first p p p T compute the covariance matrix Σab of ( p, K, p 2, a, b) via Σ ab = DabΣ reddab, where D ab O =. L θ θ2 θ L 2 p p T Then, Σ = D Σ D with ab 9 American Institutes for Research

11 D I I 2 2 = D = 2 θ θ θ θ θ, θ θ θ θ θ Where, I 2 is the 2 dimensional identity matrix. Note that Finally, the moment covariance matrix D a p. 2 b = p p p T Σ M = MΣ M for any rade is calculated. Here, M θ θ2 L θ θ θ2 θ L =. M M M M θ θ2 L θ We note that this approach nelects the covariance amon moment estimates across rades. The impact of this simplification remains an open, empirical question. The Separate Calibration, or Chain-Linkin Procedure Unlike concurrent calibration, separate calibration estimates the parameters for each rade separately and then links them throuh the use of linkin items common to multiple rades. One of the rades becomes the base scale to which subsequent tests are linked; here, we use the lowest rade (say, rade 3). Usin the common items between the rade 3 base scale and the next rade (rade 4), we determine a transformation that puts the item parameters from rade 4 on the same scale as rade 3. This process of chain-linkin repeats until all rades are scaled to the rade 3 base scale. Vertical linkin, when performed via separate calibration, is a localized operation, consistin of pair-wise linkaes of consecutive rades that establish the vertical scale. Vertical Linkin Baseline: Grade 3. With rade 3 as our base, the vertically linked scale score L θ 3i for the i th student in rade 3 coincides with the i th student within-rade scaled proficiency L θ 3i, that is, θ3 i = θ3 i. Vertical Linkin of Grade 3 to Grade 4. In rade 4, θ 4i denotes the achievement of the i th student from the within-rade scalin of the rade 4 items. We link rade 4 to rade 3 throuh a set of linkin items, items that are common to both the rade 3 and rade 4 tests. If there are m 34 of these items, b (3)4 is the vector of Rasch difficulty estimates for these items when they are b ) scaled within the fourth-rade data and 3(4 is the vector of difficulty estimates for the same 0 American Institutes for Research

12 b(3)4 items when the fourth-rade data is calibrated. Let b3(4) and be the means of these parameter estimates. Then, standard Rasch practice links the rade 4 achievement scale to the rade 3 achievement scale via where θ = θ + B, L B34 = b(3)4 b3(4) is the linkin constant. Because the linkin constant is estimated from both a sample of students and a sample of test items (those selected for linkin), it is subject to samplin error from both sources. The error from the samplin of items arises because linkin items are not entirely exchaneable in the linkin process a different sample of items would yield a different linkae. Under the assumption that the samplin error is independent across items, the variance of the vector b(3)4 b3(4) of lenth m 34 should reflect both sources of error. This assumption, of course, is unlikely to be true, and the consequences of its violation remain an open, empirical question. The linkin constant is the averae over the vector b(3)4 b3(4), so we propose to approximate the standard error of the linkin constant by the standard error of this mean, Var B Var b b b b B m = ( (3)4 3(4) ) = ( (3)4, j 3(4), j 34), m34( m 34 ) j= ( ) ( ) Var ( B ) SE B = L L The variance of the mean of rade 4 students is (with µ 4 = θ4 and µ 4 = θ4 ): ( µ L 4 ) ( µ 4 ) ( 34 ) ( 34 ) Var = Var + Var B = Var B. The latter holds because the population means in separate calibration are fixed to zero; they are not estimated. Comparisons between the rades 3 and 4 on the vertical scale must contain the Var B. That is, variance component ( ) 34 Var ( µ L 4 µ 3) Var ( B34) =. American Institutes for Research

13 Vertically Linked Scale. Applyin the formulae of the previous section to all pairs of adjacent rades creates the vertically linked trait scale that includes all rades. This results in a series of linkin constants ( B34, B45, K, BG, G) with correspondin variances ( Var( B34), Var( B45),, Var( BG, G) ) K. As with concurrent calibration, when analyzin the ability shifts rades < ', we include the followin variance component: µ µ amon two arbitrary L L ' Var( µ µ ) = Var( B + ). L L ' h, h h< ' Aain, these formulae are approximate because they treat the samplin variance of the means and linkin constants as thouh they are independent across rades. 4. Simulation Study This section describes a simulation study desined to compare the accuracy and precision of each of the linkin methods under realistic data conditions and to evaluate the efficacy of the proposed standard error approximations. We base the study on the vertical linkin sample desin used to link the Ohio Achievement Tests from rades 3 8. In eneral, this desin includes approximately 25 schools and 0,000 students per rade and six linkin items shared between each pair of adjacent rades (desin chanes implemented after this study was completed increased the actual number of linkin items in adjacent rades). Realistic values of some of the parameters of the model of item response set forth in Section 2 (above) were simply not known. To obtain realistic values, we enerated data sets from 54 different data confiurations, and we selected the confiuration that most closely approximated the desin effects observed in real item responses from a similar (within rade) sample desin. For ease of exposition, we refer to the dataset that yielded the most realistic within-rade desineffects the most Ohio-like confiuration of parameters. Usin the most Ohio-like confiuration, we enerated 00 datasets and applied both linkin procedures to evaluate the ability of the procedures to recover the eneratin parameters; the ability of the procedures to accurately approximate the precision of the estimates; the precision of the estimates from each procedure. 2 American Institutes for Research

14 The within-rade data that we used to match the parameter confiurations cannot, of course, inform our choice of values for the correlations between the within-rade and vertical scale. Therefore, we enerate additional datasets, holdin all parameters constant except the correlation, which we vary from.70 to.98 to observe the impact of this factor on the accuracy and precision of estimates. Data Generation Details Similar to the Ohio desin, our data also span rades 3 throuh 8. A sinle linkin form per rade consists of 33 core items (39 for rades 3 and 8) and 2 linkin items per form (except for rades 3 and 8, which had only 6 linkin items). We clustered our data within 25 elementary schools (rades 3 5) and 25 middle schools (rades 6 8). As is the case in Ohio, not all schools contributed data for all rades. We enerated 40 total schools, but only 25 schools contributed scores for any one rade (see Table ). Our data consisted of approximately 0,000 students per rade, for a total of 60,000 observations in each data set. Table : Data structure of simulated data sets: Number of schools for each rade, 60,000 observations total. Grade Number of Schools X X 40 X X 40 X 40 X 45 X X X 40 X X 40 X X 40 X 40 X 45 X X X Total number per rade Generation of Item Responses The data were enerated accordin to the model outlined in Section 2, A model of item response. First, we enerated the vector of latent traits θ and Ψ as specified in Equation 4. For convenience, we scaled the stochastic terms to yield traits with unit variance. Next, we enerated the stochastic components of the item response function as in Equation 3, takin care to scale these components to yield e ikj with a standard deviation of approximately.7, to match the standard loistic curve of the Rasch model. The final step calculated the item responses. Table 2 summarizes the key parameters of those models. 3 American Institutes for Research

15 Table 2: Summary of key factors likely to influence vertical linkae. Factor The linear relationship ( β ) between θ, the radespecific trait, and Ψ, the vertical trait Annual rowth, a Variance of school effects on vertical trait 2 ( var( vk ) = σ v( k ) ) Variance of school effects on rade-specific trait 2 ( var( w ) = σ ) k w( k ) Comments Parameters of the latent traits In our datasets, this coefficient is also the correlation coefficient. This is the averae rowth on the vertical scale in a year. For our study, we simply take this as a constant. This is the school effect on the vertical trait. This rade-specific school effect compounds the school effect associated with the latent trait. Because w k compounds v k, it is assumed to be small. Stochastic parameters of item response Variance of the item-specific This item-specific school effect school effects compounds the impact of school 2 ( var( u jk ) = σ u( jk ) ) effects on the latent traits. Dependin on curricular differences, it could be substantial. Recall that this is part of a stochastic term with a standard deviation of.7, so the larest value represents about 22 percent of the variance. Likely Rane of Realistic Values Values Used in Simulated Data Sets ,.90, , ,.0025, ,.2500,.6400 The final column of the table presents the candidate values for each parameter in the simulations. From that, we see that we have 3 * 3 * 3 * 2 = 54 possible combinations. To select the most realistic values, we calculated the averae desin effect on estimates of the percentae of correct responses to each item from a real data set (rade 3 readin data, drawn from a similar sample desin) and compared that to the observed desin effects in the simulated data sets. From these we identified the confiuration that most closely matched the real data. The details of these confiurations are presented in Appendix A. The parameters of the most Ohio-like confiuration are presented in Table 3. 4 American Institutes for Research

16 Table 3: Parameters of the most Ohio-like confiuration. Parameter Value β.98 a.50 2 σ v(k ).0 2 σ w(k ) σ u( jk ).25 Recovery of Generatin Parameters, Precision, and Accuracy of the Standard Errors We created 00 datasets, usin the most Ohio-like confiuration to evaluate the accuracy, precision and effectiveness of the standard error estimators of the two linkin methods. Table 4 compares the results of these simulations. The estimates of both methods reveal a small bias, which increases as the estimates cross additional rades. This findin seems intuitive because the trait measured is a compromise between the vertical trait and the somewhat attenuated rade-specific traits. The final columns compare the empirical standard errors (the standard deviations of the estimates across 00 datasets). From these, we see that the joint calibration produces estimates that are slihtly more efficient than the separate calibration procedure. Aain, this is reassurin because the joint calibration brins more information to bear in estimatin each item parameter. The standard error estimates from the joint calibration very closely match the empirical standard errors, but the proposed standard error estimator for the chain linkin underestimates the standard errors by about 5 20 percent. Table 4: Joint and separate linkin constants with standard errors, over 00 replications. Grades True linkin constant EB ( ) Linkin Constant Separate calibration estimate B j Joint Calibration estimate sep Standard Error of the Estimate Separate calibration Joint calibration Observed Standard Deviation of the Estimate Separate calibration Joint calibration B SE( B ) SE( B ) SD( B ) SD( B ) In summary, the two procedures offer virtually identical point estimates when averaed over many data sets. The joint calibration procedure is somewhat more efficient, and the proposed standard error estimator for the joint calibration procedure provides a more accurate approximation than the standard error estimator that we proposed for the chain-linkin procedure. sep j sep j 5 American Institutes for Research

17 ( Linkin Rasch Scales Across Grades in Clustered Samples Correlation Between Vertical and Grade-Specific Trait Given that the two procedures provide very similar results, and that the standard error estimates are more accurate for the joint calibration procedure, we analyzed the effect of varyin the correlation by usin only the joint calibration procedure. Table 5 describes the effect of varyin the correlation between the vertical and rade-specific trait for five of the data sets confiured to the most Ohio-like specifications with ρ ranin from.70 to.98. As this correlation increases, the standard error decreases within each rade. When ρ is small, the standard errors of the linkin constants and root mean square error (rmse) are lare. No clear relationship appears between ρ and the observed bias in the estimated linkin constants. ρ Table 5: Linkin constant and Standard Error for different correlations between the vertical and rade specific trait. Grades ρ E B ) B j SE( B j ) Bias RMSE Note: Results shown in this table are from five data sets, each created to the same most-ohio-like specification scheme with only the correlation between the rade-specific and vertical trait varyin. 6 American Institutes for Research

18 5. Conclusion The main oal of the study was to document the performance of the two linkin methods by usin realistic data and to verify the estimator of the standard error of the vertical linkin constant. This study has found that: The two methods produce nearly identical results, even when the vertical linkin items and the main assessment items load on separate, correlated traits; Our proposed standard error estimator for the joint calibration procedure matches the empirical standard errors to within 3 percent under complex sample desins; The joint calibration procedure produces moderately more efficient estimates, with reduction in the standard errors of 4 0 percent; Our proposed standard error estimator for the separate estimation method underestimates the empirical standard errors by 0 5 percent; The precision of the estimates is affected by the correlation between the on-rade traits and the vertical trait, with lower correlations associated with much less precise estimates. Other than the item misfit introduced by the clusterin of error terms and multidimensionality, this study did not address the impact of item fit on the standard errors. In real data, we expect that item misfit will contribute to larer standard errors than those presented here. 7 American Institutes for Research

19 Appendix A: Data Confiurations and Most Ohio-Like Data Generation The data simulated for this study were enerated to closely resemble real test data from Ohio. We wanted to match the Ohio desin as closely as possible, while simplifyin the data structure to facilitate analysis by eliminatin stratification and constructed-response items and by eneratin only a sinle test form per rade. Our initial step in data eneration was to select several potential realistic values that the parameters of interest could take (see Table 2 in the body of the report) and then to enerate data sets representin every unique combination of these values. The resultin 54 data set confiurations are described in Table A-. To select from these 54 the data sets the confiuration that most closely matched real Ohio data, we computed the desin effects for the simulated data, and we selected the data sets that most closely resembled the real desin effects from the rade 3 readin test (DE = 3.645). As can be seen from Fiure A-, one set of data sets more closely resembles the Ohio root desin effect than the others (correspondin to the data sets where we set the standard deviation of the item specific school effects parameter equal to 0.5) and include the data sets numbered (2, 5, 8,, 44, 47, 50, 53) in Table A-. We call these the Ohio-like datasets. Also apparent from this fiure is that chanin the value of this parameter has the larest impact on root desin effect. Varyin the other parameters has much smaller impact. 8 American Institutes for Research

20 Fiure A-: Grade 3 desin effects for 54 data sets representin all unique combinations of parameters likely to influence linkin error Mean Root Desin Effect Dataset NOTE: The solid line represents the root desin effect observed in the Ohio operational rade 3 Readin data (Root mean desin effect =.89) From these 2 Ohio-like confiurations, we selected one (data set #38) that was very close to the true Ohio desin effect observed in the third-rade Readin test data. The specifications for the Most Ohio-like data set are provided in Table A- and Table 3 (in the body of the report). The simulations described in this report are based on 00 data sets enerated to these specifications. 9 American Institutes for Research

21 Table A-: Confiuration of oriinal 54 data sets. Data Set ID Correlation between θ and θ SD of school effects on vertical trait SD of school effect on rade specific trait SD of item specific school effects Annual rowth NOTE: The Ohio-like data sets are bolded and the most Ohio-like data set, #38, is shaded. 20 American Institutes for Research

22 References Binder, D. A. (983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 5, Cochran, W. G. (977). Samplin techniques (3rd ed.). New York: John Wiley & Sons. Cohen, J., Chan, T., Jian, T., & Seburn, M. (2005). Consistent estimation of Rasch Item Parameters and their standard errors under complex sample desins. Manuscript submitted for publication. Cronbach, L. J., & Furby, L. (970). How we should measure chane or should we? Psycholoical Bulletin, 74(), De Leeuw, J., & Verhelst, N. (986). Maximum likelihood estimation in eneralized Rasch models. Journal of Educational Statistics,, Glas, C. A. W. (2005). Structural item response. In Encyclopedia of social measurement (Vol. 3). London: Elsevier Ltd. Godambe, V. P. (960). An optimum property of reular maximum likelihood estimation. The Annals of Mathematical Statistics, 3(4), Godambe, V. P., & Thompson, M. E. (984) Robust estimation throuh estimatin equations. Biometrika, 7(), Hambleton, R. K., & Swaminathan, H. (985). Item Response Theory: Principles and applications. Boston: Kluwer-Nijhoff. Hanson, B. A., & Béuin, A. A. (2002). Obtainin a common scale for Item Response Theory item parameters usin separate versus concurrent estimation in the common-item equatin desin. Applied Psycholoical Measurement, 26(), Harris, D. J. (99). Effects of passae and item scramblin on equatin relationships. Applied Psych Measurement, 5(3), Kamata, A. (998). Some eneralizations of the Rasch model: An application of the hierarchical eneralized linear model. Unpublished doctoral dissertation, Michian State University, East Lansin. Kamata, A. (200). Item analysis by the Hierarchical Generalized Linear Model. Journal of Educational Measurement, 38(), Karkee, T., Lewis, D. M., Hoskens, M., & Yao, L. (2003). Separate versus concurrent calibration methods in vertical scalin. Paper presented at the annual meetin of the National Council on Measurement in Education, Chicao. 2 American Institutes for Research

23 Kim, S.-H., & Cohen, A. S. (998). A comparison of linkin and concurrent calibration under item response theory. Applied Psycholoical Measurement, 22(2), Kish, L. (965). Survey samplin. New York: John Wiley & Sons. Kolen, M. J. (98). Comparison of traditional and item response theory methods for equatin tests. Journal of Educational Measurement, 8,. Kolen, M. J., & Brennan, R. L. (2004). Test equatin, scalin and linkin: Methods and practices. New York: Spriner-Verla. Linn, R. L., & Slinde, J. A. (977). The determination of the sinificance of chane between pre and posttestin periods. Review of Educational Research, 47, Lord, F. M. (980). Application of item response theory to practical testin problems. Hillsdale, NJ: Erlbaum. Mislevy, R. J. (984). Estimatin latent distributions. Psychometrika, 49, Muraki, E., Hombo, C. M., & Lee, Y.-W. (2000). Equatin and linkin performance assessments. Applied Psycholoical Measurement, 24(4), Patz R. J., & Junker, B. W. (999). Application and extension of MCMC in IRT: Multiple item types, missin data, and rated response. Journal of Educational and Behavioral Statistics, 24(4), Peterson, N. S., Cook, L. L., & Stockin, M. L. (983). IRT versus conventional equatin methods: A comparative study of scale stability. Journal of Educational Statistics, 8(2), Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modellin. Psychometrika, 69(2), Rasch, G. (960). Probabilistic models for some intellience and attainment tests. Copenhaen: Denmarks Paedaoiske Institut. Roosa, D. R., Brandt, D., & Zimowski, M. (982). A rowth curve approach to the measurement of chane. Psycholoical Bulletin, 92, Sarndal, C. E., Swenson, B., & Wretman, J. (992). Model assisted survey samplin. New York: Spriner-Verla. Skas, G., & Lissitz, R. W. (988). IRT test equatin: Relevant issues and a review of recent research. Review of Educational Research, 56(4), Skrondal, A., & Rabe-Hesketh, S. (2003). Multilevel loistic reression for polytomous data and rankins. Psychometrika, 68(2), American Institutes for Research

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated

More information

Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity

Asymptotic Behavior of a t Test Robust to Cluster Heterogeneity Asymptotic Behavior of a t est Robust to Cluster Heteroeneity Andrew V. Carter Department of Statistics University of California, Santa Barbara Kevin. Schnepel and Doulas G. Steierwald Department of Economics

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February

More information

Robust Semiparametric Optimal Testing Procedure for Multiple Normal Means

Robust Semiparametric Optimal Testing Procedure for Multiple Normal Means Veterinary Dianostic and Production Animal Medicine Publications Veterinary Dianostic and Production Animal Medicine 01 Robust Semiparametric Optimal Testin Procedure for Multiple Normal Means Pen Liu

More information

Solution to the take home exam for ECON 3150/4150

Solution to the take home exam for ECON 3150/4150 Solution to the tae home exam for ECO 350/450 Jia Zhiyan and Jo Thori Lind April 2004 General comments Most of the copies we ot were quite ood, and it seems most of you have done a real effort on the problem

More information

PHY 133 Lab 1 - The Pendulum

PHY 133 Lab 1 - The Pendulum 3/20/2017 PHY 133 Lab 1 The Pendulum [Stony Brook Physics Laboratory Manuals] Stony Brook Physics Laboratory Manuals PHY 133 Lab 1 - The Pendulum The purpose of this lab is to measure the period of a simple

More information

Lecture 5 Processing microarray data

Lecture 5 Processing microarray data Lecture 5 Processin microarray data (1)Transform the data into a scale suitable for analysis ()Remove the effects of systematic and obfuscatin sources of variation (3)Identify discrepant observations Preprocessin

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

More information

Multidimensional Linking for Tests with Mixed Item Types

Multidimensional Linking for Tests with Mixed Item Types Journal of Educational Measurement Summer 2009, Vol. 46, No. 2, pp. 177 197 Multidimensional Linking for Tests with Mixed Item Types Lihua Yao 1 Defense Manpower Data Center Keith Boughton CTB/McGraw-Hill

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 41 A Comparative Study of Item Response Theory Item Calibration Methods for the Two Parameter Logistic Model Kyung

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010 Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture

More information

Causal Bayesian Networks

Causal Bayesian Networks Causal Bayesian Networks () Ste7 (2) (3) Kss Fus3 Ste2 () Fiure : Simple Example While Bayesian networks should typically be viewed as acausal, it is possible to impose a causal interpretation on these

More information

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS by Mary A. Hansen B.S., Mathematics and Computer Science, California University of PA,

More information

Generalized Least-Squares Regressions V: Multiple Variables

Generalized Least-Squares Regressions V: Multiple Variables City University of New York (CUNY) CUNY Academic Works Publications Research Kinsborouh Community Collee -05 Generalized Least-Squares Reressions V: Multiple Variables Nataniel Greene CUNY Kinsborouh Community

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 31 Assessing Equating Results Based on First-order and Second-order Equity Eunjung Lee, Won-Chan Lee, Robert L. Brennan

More information

Adjustment of Sampling Locations in Rail-Geometry Datasets: Using Dynamic Programming and Nonlinear Filtering

Adjustment of Sampling Locations in Rail-Geometry Datasets: Using Dynamic Programming and Nonlinear Filtering Systems and Computers in Japan, Vol. 37, No. 1, 2006 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J87-D-II, No. 6, June 2004, pp. 1199 1207 Adjustment of Samplin Locations in Rail-Geometry

More information

n j u = (3) b u Then we select m j u as a cross product between n j u and û j to create an orthonormal basis: m j u = n j u û j (4)

n j u = (3) b u Then we select m j u as a cross product between n j u and û j to create an orthonormal basis: m j u = n j u û j (4) 4 A Position error covariance for sface feate points For each sface feate point j, we first compute the normal û j by usin 9 of the neihborin points to fit a plane In order to create a 3D error ellipsoid

More information

Ning Wu Institute for Traffic Engineering Ruhr University Bochum, Germany Tel: ; Fax: ;

Ning Wu Institute for Traffic Engineering Ruhr University Bochum, Germany Tel: ; Fax: ; MODELLING THE IMPACT OF SIDE-STREET TRAFFIC VOLUME ON MAJOR- STREET GREEN TIME AT ISOLATED SEMI-ACTUATED INTERSECTIONS FOR SIGNAL COORDINATION DECISIONS Donmei Lin, Correspondin Author Center for Advanced

More information

Misconceptions about sinking and floating

Misconceptions about sinking and floating pplyin Scientific Principles to Resolve Student Misconceptions by Yue Yin whether a bar of soap will sink or float. Then students are asked to observe (O) what happens. Finally, students are asked to explain

More information

Linearized optimal power flow

Linearized optimal power flow Linearized optimal power flow. Some introductory comments The advantae of the economic dispatch formulation to obtain minimum cost allocation of demand to the eneration units is that it is computationally

More information

COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS

COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Mayuko Kanada Simon IN PARTIAL

More information

Monte Carlo Simulations for Rasch Model Tests

Monte Carlo Simulations for Rasch Model Tests Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,

More information

Blinder-Oaxaca Decomposition for Tobit Models

Blinder-Oaxaca Decomposition for Tobit Models Blinder-Oaxaca Decomposition for Tobit Models Thomas K. Bauer, Mathias Sinnin To cite this version: Thomas K. Bauer, Mathias Sinnin. Blinder-Oaxaca Decomposition for Tobit Models. Applied Economics, Taylor

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Multi-sample structural equation models with mean structures, with special emphasis on assessing measurement invariance in cross-national research

Multi-sample structural equation models with mean structures, with special emphasis on assessing measurement invariance in cross-national research 1 Multi-sample structural equation models with mean structures, with special emphasis on assessin measurement invariance in cross-national research Measurement invariance measurement invariance: whether

More information

Conical Pendulum Linearization Analyses

Conical Pendulum Linearization Analyses European J of Physics Education Volume 7 Issue 3 309-70 Dean et al. Conical Pendulum inearization Analyses Kevin Dean Jyothi Mathew Physics Department he Petroleum Institute Abu Dhabi, PO Box 533 United

More information

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff

More information

Renormalization Group Theory

Renormalization Group Theory Chapter 16 Renormalization Group Theory In the previous chapter a procedure was developed where hiher order 2 n cycles were related to lower order cycles throuh a functional composition and rescalin procedure.

More information

AG DANK/BCS Meeting 2013 in London University College London, 8/9 November 2013

AG DANK/BCS Meeting 2013 in London University College London, 8/9 November 2013 AG DANK/S Meetin 3 in London University ollee London 8/9 November 3 MDELS FR SIMULTANEUS LASSIFIATIN AND REDUTIN F THREE-WAY DATA Roberto Rocci University Tor erata Rome A eneral classification model:

More information

Correlated Component Regression: A Prediction/Classification Methodology for Possibly Many Features

Correlated Component Regression: A Prediction/Classification Methodology for Possibly Many Features (Reprinted from the 2010 American Statistical Association Proceedins with Edits) Correlated Component Reression: A Prediction/Classification Methodoloy for Possibly Many Features Jay Maidson Statistical

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

A Mathematical Model for the Fire-extinguishing Rocket Flight in a Turbulent Atmosphere

A Mathematical Model for the Fire-extinguishing Rocket Flight in a Turbulent Atmosphere A Mathematical Model for the Fire-extinuishin Rocket Fliht in a Turbulent Atmosphere CRISTINA MIHAILESCU Electromecanica Ploiesti SA Soseaua Ploiesti-Tiroviste, Km 8 ROMANIA crismihailescu@yahoo.com http://www.elmec.ro

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm Margo G. H. Jansen University of Groningen Rasch s Poisson counts model is a latent trait model for the situation

More information

Matrix multiplication: a group-theoretic approach

Matrix multiplication: a group-theoretic approach CSG399: Gems of Theoretical Computer Science. Lec. 21-23. Mar. 27-Apr. 3, 2009. Instructor: Emanuele Viola Scribe: Ravi Sundaram Matrix multiplication: a roup-theoretic approach Given two n n matrices

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Dimensionality Assessment: Additional Methods

Dimensionality Assessment: Additional Methods Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION

MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION = = = MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION Mauro Cettolo and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnoloica I-385 Povo, Trento, Italy ABSTRACT Robust acoustic

More information

An EM Algorithm for the Student-t Cluster-Weighted Modeling

An EM Algorithm for the Student-t Cluster-Weighted Modeling An EM Alorithm for the Student-t luster-weihted Modelin Salvatore Inrassia, Simona. Minotti, and Giuseppe Incarbone Abstract luster-weihted Modelin is a flexible statistical framework for modelin local

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Nonlinear Model Reduction of Differential Algebraic Equation (DAE) Systems

Nonlinear Model Reduction of Differential Algebraic Equation (DAE) Systems Nonlinear Model Reduction of Differential Alebraic Equation DAE Systems Chuili Sun and Jueren Hahn Department of Chemical Enineerin eas A&M University Collee Station X 77843-3 hahn@tamu.edu repared for

More information

Strong Interference and Spectrum Warfare

Strong Interference and Spectrum Warfare Stron Interference and Spectrum Warfare Otilia opescu and Christopher Rose WILAB Ruters University 73 Brett Rd., iscataway, J 8854-86 Email: {otilia,crose}@winlab.ruters.edu Dimitrie C. opescu Department

More information

Experiment 3 The Simple Pendulum

Experiment 3 The Simple Pendulum PHY191 Fall003 Experiment 3: The Simple Pendulum 10/7/004 Pae 1 Suested Readin for this lab Experiment 3 The Simple Pendulum Read Taylor chapter 5. (You can skip section 5.6.IV if you aren't comfortable

More information

Bidirectional Clustering of Weights for Finding Succinct Multivariate Polynomials

Bidirectional Clustering of Weights for Finding Succinct Multivariate Polynomials IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.5, May 28 85 Bidirectional Clusterin of Weihts for Findin Succinct Multivariate Polynomials Yusuke Tanahashi and Ryohei Nakano

More information

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research rinceton, NJ scott.rickard,radu.balan,justinian.rosca @scr.siemens.com ABSTRACT

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013

More information

Statistical and psychometric methods for measurement: Scale development and validation

Statistical and psychometric methods for measurement: Scale development and validation Statistical and psychometric methods for measurement: Scale development and validation Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course Washington, DC. June 11,

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

Investigation of ternary systems

Investigation of ternary systems Investiation of ternary systems Introduction The three component or ternary systems raise not only interestin theoretical issues, but also have reat practical sinificance, such as metallury, plastic industry

More information

Phase Diagrams: construction and comparative statics

Phase Diagrams: construction and comparative statics 1 / 11 Phase Diarams: construction and comparative statics November 13, 215 Alecos Papadopoulos PhD Candidate Department of Economics, Athens University of Economics and Business papadopalex@aueb.r, https://alecospapadopoulos.wordpress.com

More information

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Intervention in Gene Reulatory Networks via a Stationary Mean-First-Passae-Time Control Policy Golnaz Vahedi, Student Member, IEEE, Babak Faryabi, Student

More information

Equating Subscores Using Total Scaled Scores as an Anchor

Equating Subscores Using Total Scaled Scores as an Anchor Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and

More information

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture

More information

On K-Means Cluster Preservation using Quantization Schemes

On K-Means Cluster Preservation using Quantization Schemes On K-Means Cluster Preservation usin Quantization Schemes Deepak S. Turaa Michail Vlachos Olivier Verscheure IBM T.J. Watson Research Center, Hawthorne, Y, USA IBM Zürich Research Laboratory, Switzerland

More information

Generalized Distance Metric as a Robust Similarity Measure for Mobile Object Trajectories

Generalized Distance Metric as a Robust Similarity Measure for Mobile Object Trajectories Generalized Distance Metric as a Robust Similarity Measure for Mobile Object rajectories Garima Pathak, Sanjay Madria Department of Computer Science University Of Missouri-Rolla Missouri-6541, USA {madrias}@umr.edu

More information

IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model

IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model University of Iowa Iowa Research Online Theses and Dissertations Summer 2017 IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model Kyung Yong Kim University

More information

Studies on the effect of violations of local independence on scale in Rasch models: The Dichotomous Rasch model

Studies on the effect of violations of local independence on scale in Rasch models: The Dichotomous Rasch model Studies on the effect of violations of local independence on scale in Rasch models Studies on the effect of violations of local independence on scale in Rasch models: The Dichotomous Rasch model Ida Marais

More information

Decomposing compositional data: minimum chi-squared reduced-rank approximations on the simplex

Decomposing compositional data: minimum chi-squared reduced-rank approximations on the simplex Decomposin compositional data: minimum chi-squared reduced-rank approximations on the simplex Gert Jan Welte Department of Applied Earth Sciences Delft University of Technoloy PO Box 508 NL-600 GA Delft

More information

A Performance Comparison Study with Information Criteria for MaxEnt Distributions

A Performance Comparison Study with Information Criteria for MaxEnt Distributions A Performance Comparison Study with nformation Criteria for MaxEnt Distributions Ozer OZDEMR and Aslı KAYA Abstract n statistical modelin, the beinnin problem that has to be solved is the parameter estimation

More information

Grouped Effects Estimators in Fixed Effects Models

Grouped Effects Estimators in Fixed Effects Models Grouped Effects Estimators in Fixed Effects Models C. Alan Bester and Christian B. Hansen April 2009 Abstract. We consider estimation of nonlinear panel data models with common and individual specific

More information

Lesson 6: Reliability

Lesson 6: Reliability Lesson 6: Reliability Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences NMST 570, December 12, 2017 Dec 19, 2017 1/35 Contents 1. Introduction

More information

2 Bayesian Hierarchical Response Modeling

2 Bayesian Hierarchical Response Modeling 2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item

More information

Wire antenna model of the vertical grounding electrode

Wire antenna model of the vertical grounding electrode Boundary Elements and Other Mesh Reduction Methods XXXV 13 Wire antenna model of the vertical roundin electrode D. Poljak & S. Sesnic University of Split, FESB, Split, Croatia Abstract A straiht wire antenna

More information

Pairwise Parameter Estimation in Rasch Models

Pairwise Parameter Estimation in Rasch Models Pairwise Parameter Estimation in Rasch Models Aeilko H. Zwinderman University of Leiden Rasch model item parameters can be estimated consistently with a pseudo-likelihood method based on comparing responses

More information

Assessment of the MCNP-ACAB code system for burnup credit analyses. N. García-Herranz 1, O. Cabellos 1, J. Sanz 2

Assessment of the MCNP-ACAB code system for burnup credit analyses. N. García-Herranz 1, O. Cabellos 1, J. Sanz 2 Assessment of the MCNP-ACAB code system for burnup credit analyses N. García-Herranz, O. Cabellos, J. Sanz 2 Departamento de Ineniería Nuclear, Universidad Politécnica de Madrid 2 Departamento de Ineniería

More information

A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling

A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling ohammad Khoshnevisan School o Accountin and inance riith University Australia Housila P. Sinh School o Studies in Statistics ikram University Ujjain - 56. P. India Sarjinder Sinh Departament o athematics

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

6 Mole Concept. g mol. g mol. g mol ) + 1( g : mol ratios are the units of molar mass. It does not matter which unit is on the

6 Mole Concept. g mol. g mol. g mol ) + 1( g : mol ratios are the units of molar mass. It does not matter which unit is on the What is a e? 6 Mole Concept The nature of chemistry is to chane one ecule into one or more new ecules in order to create new substances such as paints, fertilizers, food additives, medicines, etc. When

More information

Expanded Knowledge on Orifice Meter Response to Wet Gas Flows

Expanded Knowledge on Orifice Meter Response to Wet Gas Flows 32 nd International North Sea Flow Measurement Workshop 21-24 October 2014 Expanded Knowlede on Orifice Meter Response to Wet Gas Flows Richard Steven, Colorado Enineerin Experiment Station Inc Josh Kinney,

More information

A Probabilistic Analysis of Propositional STRIPS. Planning. Tom Bylander. Division of Mathematics, Computer Science, and Statistics

A Probabilistic Analysis of Propositional STRIPS. Planning. Tom Bylander. Division of Mathematics, Computer Science, and Statistics A Probabilistic Analysis of Propositional STRIPS Plannin Tom Bylander Division of Mathematics, Computer Science, and Statistics The University of Texas at San Antonio San Antonio, Texas 78249 USA bylander@riner.cs.utsa.edu

More information

Contributions to latent variable modeling in educational measurement Zwitser, R.J.

Contributions to latent variable modeling in educational measurement Zwitser, R.J. UvA-DARE (Digital Academic Repository) Contributions to latent variable modeling in educational measurement Zwitser, R.J. Link to publication Citation for published version (APA): Zwitser, R. J. (2015).

More information

RESISTANCE STRAIN GAGES FILLAMENTS EFFECT

RESISTANCE STRAIN GAGES FILLAMENTS EFFECT RESISTANCE STRAIN GAGES FILLAMENTS EFFECT Nashwan T. Younis, Younis@enr.ipfw.edu Department of Mechanical Enineerin, Indiana University-Purdue University Fort Wayne, USA Bonsu Kan, kan@enr.ipfw.edu Department

More information

arxiv: v1 [cs.ai] 15 Nov 2013

arxiv: v1 [cs.ai] 15 Nov 2013 Inferrin Multilateral Relations from Dynamic Pairwise Interactions arxiv:1311.3982v1 [cs.ai] 15 Nov 2013 Aaron Schein, Juston Moore, Hanna Wallach School of Computer Science University of Massachusetts

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model Cees A. W. Glas, University of Twente, the Netherlands Juan Carlos Suárez Falcón, Universidad Nacional de Educacion a Distancia,

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

(a) Find the function that describes the fraction of light bulbs failing by time t e (0.1)x dx = [ e (0.1)x ] t 0 = 1 e (0.1)t.

(a) Find the function that describes the fraction of light bulbs failing by time t e (0.1)x dx = [ e (0.1)x ] t 0 = 1 e (0.1)t. 1 M 13-Lecture March 8, 216 Contents: 1) Differential Equations 2) Unlimited Population Growth 3) Terminal velocity and stea states Voluntary Quiz: The probability density function of a liht bulb failin

More information

Improved transformation of ϕ-divergence goodness-of-fit test statistics based on minimum ϕ -divergence estimator for GLIM of binary data

Improved transformation of ϕ-divergence goodness-of-fit test statistics based on minimum ϕ -divergence estimator for GLIM of binary data SUT Journal of Mathematics Vol. 52, No. 2 (216), 193 214 Improved transformation of ϕ-diverence oodness-of-fit test statistics based on minimum ϕ -diverence estimator for GLIM of binary data Nobuhiro Taneichi,

More information

V DD. M 1 M 2 V i2. V o2 R 1 R 2 C C

V DD. M 1 M 2 V i2. V o2 R 1 R 2 C C UNVERSTY OF CALFORNA Collee of Enineerin Department of Electrical Enineerin and Computer Sciences E. Alon Homework #3 Solutions EECS 40 P. Nuzzo Use the EECS40 90nm CMOS process in all home works and projects

More information

Standard Test Method for Sulfur in Automotive Fuels by Polarization X-ray Fluorescence Spectrometry 1

Standard Test Method for Sulfur in Automotive Fuels by Polarization X-ray Fluorescence Spectrometry 1 Desination: D 7220 06 An American National Standard Standard Test Method for Sulfur in Automotive Fuels by Polarization X-ray Fluorescence Spectrometry 1 This standard is issued under the fixed desination

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

Convergence of DFT eigenvalues with cell volume and vacuum level

Convergence of DFT eigenvalues with cell volume and vacuum level Converence of DFT eienvalues with cell volume and vacuum level Sohrab Ismail-Beii October 4, 2013 Computin work functions or absolute DFT eienvalues (e.. ionization potentials) requires some care. Obviously,

More information

IRT Model Selection Methods for Polytomous Items

IRT Model Selection Methods for Polytomous Items IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running

More information

Round-off Error Free Fixed-Point Design of Polynomial FIR Predictors

Round-off Error Free Fixed-Point Design of Polynomial FIR Predictors Round-off Error Free Fixed-Point Desin of Polynomial FIR Predictors Jarno. A. Tanskanen and Vassil S. Dimitrov Institute of Intellient Power Electronics Department of Electrical and Communications Enineerin

More information

Altitude measurement for model rocketry

Altitude measurement for model rocketry Altitude measurement for model rocketry David A. Cauhey Sibley School of Mechanical Aerospace Enineerin, Cornell University, Ithaca, New York 14853 I. INTRODUCTION In his book, Rocket Boys, 1 Homer Hickam

More information

Modeling rater effects using a combination of Generalizability Theory and IRT

Modeling rater effects using a combination of Generalizability Theory and IRT Psychological Test and Assessment Modeling, Volume 60, 2018 (1), 53-80 Modeling rater effects using a combination of Generalizability Theory and IRT Jinnie Choi 1 & Mark R. Wilson 2 Abstract Motivated

More information

Teams to exploit spatial locality among agents

Teams to exploit spatial locality among agents Teams to exploit spatial locality amon aents James Parker and Maria Gini Department of Computer Science and Enineerin, University of Minnesota Email: [jparker,ini]@cs.umn.edu Abstract In many situations,

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 37 Effects of the Number of Common Items on Equating Precision and Estimates of the Lower Bound to the Number of Common

More information