AMERICAN INSTITUTES FOR RESEARCH
|
|
- Bryan Baldric Taylor
- 5 years ago
- Views:
Transcription
1 AMERICAN INSTITUTES FOR RESEARCH LINKING RASCH SCALES ACROSS GRADES IN CLUSTERED SAMPLES Jon Cohen, Mary Seburn, Tamas Antal, and Matthew Gushta American Institutes for Research May 23, THOMAS JEFFERSON STREET, NW WASHINGTON, DC FAX WEBSITE
2 . Introduction Researchers and practioners share an interest in measurin student learnin over time. A sinle metric for student proficiency across rades would provide the yardstick by which student learnin could be monitored and teacher, school and proram effectiveness could be measured. Interest in, and controversy around, such vertical scales has endured for decades (Cronbach & Furby, 970; Linn & Slinde, 977; Roosa, Brandt, & Zimowski, 982). Much of the controversy arises from the fact that curricula differ across rades and the nature of the construct bein measured by such a vertical scale may vary alon the dimension. In practice, vertical scales suffer from instability it is common to find that different methods result in different inferences about rowth over time (Harris, 99; Kolen, 98; Skas & Lissitz, 988). This study is desined to compare two different vertical linkin methods in terms of the accuracy and precision of the estimators and the availability of adequate standard error estimators for realistic data. Most often, vertical linkin is accomplished throuh a common-item, nonequivalent roups desin (Muraki, Hombo, & Lee, 2000; Kolen & Brennan, 2004). Under these desins, tests for adjacent rades include a set of common items that allows the scales to be linked across rades. With these data in hand, psychometricians often use one of two methods based on Item Response Theory (IRT, Rasch, 960; Lord, 980; Hambleton & Swaminathan, 985) to link the scales across rades: Joint, or concurrent, calibration, in which the items from all of the overlappin forms of the test are calibrated toether, resultin in a sinle cross-rade scale; Chain linkin, in which each rade is calibrated separately, and the resultin item parameters of the items shared across rades are fixed to common values or estimates of them are placed on a common scale. The literature on the choice between these two eneral approaches is inconclusive at best. Some studies have found that separate estimation improves the fit between the data and the model (e.., Karakee, Lewis, Hoskens, & Yao, 2003; Kim & Cohen, 998). Separate estimation, however, doubles the number of parameters estimated, reducin the derees of freedom and necessarily improvin the fit to the particular dataset. These studies did not evaluate whether the improved fit exceeded the improvement that would arise naturally from the reduction in the derees of freedom. Peterson, Cook and Stockin (983) report that concurrent calibration produced more stable estimates, as did Hanson and Béuin (2002) under equivalent roup desins. However, under nonequivalent roup desins (as found in vertical linkin studies), Hanson and Béuin report better estimates from separate estimation. They also find that different software produces different results and that the estimates are sensitive to ancillary specifications, such as prior distributions imposed to constrain parameter estimates. American Institutes for Research
3 Kolen and Brennan (2004) suest that when the IRT model holds, concurrent calibration should produce more stable results because it uses all the available information to estimate the parameters. However, they prefer separate estimation when assumptions are violated. Taken toether, this research suests that separate calibration may provide a better fit to the data, but may do so by matchin particular data sets and reduced eneralizability. Here, we compare the methods when the data is multidimensional and arises from a complex sample. 2. A Model of Item Response This study beins by formalizin the processes presumed to enerate student responses to items in the real world. Item response theory provides a convenient foundation on which to build this model. We bein with a simple exposition of item response models and then extend that model: to reflect the fact that students are oranized into somewhat homoenous schools and classrooms; to capture the impact of chanin curricula over time. 2 American Institutes for Research
4 A Basic Model of Item Response Classical measurement theory beins with a linear model y ij = θ i + eij, where y ij is person i s response to a continuous item or test j, and e ij is the individual, item-specific measurement error of the response. if yij > b j If we have a binary measure in place of y ij we observe that zij =, where b j is a 0 otherwise threshold alon the true-score dimension. With this, the relationship between ability and item response can be stated as a probability p( z ij = θ ) = p( θ + e i i = p( e ij ij > b ) < θ b ). i j j () This relationship forms the basis of most item response theory (IRT) models. For example, if e is distributed standard loistic, we have the familiar Rasch model p( z ij θ. (2) + e = i ) = ( θi b j ) This development explicitly reconizes item response as a function of the error term e ij. Extension to Clustered Data In the real world, students are oranized into schools, and the averae proficiency of students varies across schools and classrooms. In addition, the instruction that students receive also varies by school or classroom. Both of these forces can have an impact on item response. Consider a hih-achievin school the averae proficiency (θ ) will be relatively hih, resultin in relatively hih probabilities of correct responses on the items. More subtly, consider a fourth-rade class in which the teacher enjoys teachin the multiplication of fractions, so she teaches it early and often. Her students will likely perform well on this type of item relative to other mathematics items. Therefore, we should expect to observe different patterns of performance from other teachers and other schools. The first process (the clusterin of students of similar ability within schools or classrooms) suests that the structure of θ is clustered. For example, θ ik = w k + wik where i indexes examinees and k indexes the schools or classrooms that make up the clusters. The second process difference in classroom curriculum, instruction or timin may show up as a clusterin of the measurement error, e = u + u. (3) ijk jk ijk 3 American Institutes for Research
5 Extension to Vertical Scales Vertically linked scales enerally rest on the assumption that task demands in subsequent rades are simply harder versions of the earlier rades tasks. Althouh curricula do tend to be vertically articulated, new skills and knowlede are introduced and tauht at later rade levels, whereas other skills are mastered and no loner tauht or tested. Test blueprints typically reflect these shifts, with the resultin vertical scales measurin a slihtly different trait at each rade level. It is reasonable to consider the vertical scale to be the aspect of the curriculum that chanes only in difficulty across rades. This trait will be correlated with the within-rade scales, but it is not identical to it. Denotin the within-rade trait θ and the vertical trait ψ, ψ θ ik ik = a = β ψ + v ik k + v + w k ik + w ik. (4) The mean of the vertical trait increases with rade (subscripted ). A student s proficiency on the vertical trait at a point in time would reflect both school (or classroom) and student effects. The rade-specific trait reflects its correlation with the vertical trait, as well as additional schoolspecific effects. Implications for Vertical Linkin This model of item response suests several factors that will likely influence the stability of linkaes across rades, iven a fixed number of linkin items and a fixed sample size: The correlation between the rade-specific and vertical traits. The unidimensional linkin model is misspecified when two different traits contribute to eneratin the responses to the items. If these traits are perfectly correlated, they function as a sinle trait, eliminatin this source of error. As this correlation decreases, the resultin error increases. θ The manitude of the intra-cluster correlation found in and Ψ. When units within a cluster are more similar to each other than they are to units in other clusters, they provide less information than an equivalent number of units from a random sample. The impact that samplin desin has on the precision of estimates is described by the desin effect, which is often summarized as a ratio of the actual samplin variance to the samplin variance that would result from a simple random sample of the same size (i.e., the ratio of the actual standard error squared to the standard error squared from a simple random sample; Kish, 965). The manitude of the intra-cluster correlation in item-specific responses. As above, positive intra-cluster correlation will increase the desin effect. 4 American Institutes for Research
6 In eneral, these forces constitute violations of the assumptions underlyin the basic IRT models used to calibrate and link most tests. When faced with such violations, statisticians typically take one of two strateies: develop more complex models that more accurately model the processes of interest or develop methods that are robust to such violations. A number of researchers have developed structural IRT models that capture more of the real-world complexities (e.., Patz & Junker, 999; Kamata, 998, 200; Skrondal & Rabe-Hesketh, 2003; Rabe-Hesketh, Skrondal, & Pickles, 2004; Glas, 2005). These models often require additional distributional assumptions, possibly reducin their robustness. Here, we take an alternative approach common in the samplin literature, usin the more familiar vertical-linkin point estimators and constructin robust confidence intervals around them (e.., Cochran, 977; Kish, 965; Sarndal, Swensseon, & Wretman, 992). 3. Two Models for Creatin Vertical Scales and Evaluatin Their Precision This section describes the technical details of the joint calibration and chain-linkin procedures and the proposed standard error estimators for each. The calibration procedures can be successfully implemented in a variety of ways, throuh conditional maximum likelihood (CML, as in Conquest or OPLM), marinal maximum likelihood (MML) or nonparametric marinal maximum likelihood (NPMML, as in Bilo, Parscale, or Loismo), or, with a sufficient number of items, joint maximum likelihood (JML, as in Winsteps). This study uses NPMML procedures, which have been shown to share the optimal properties of CML estimators (De Leeuw & Verhelst, 986) and for which robust variance estimators are available (Cohen, Chan, Jian, & Seburn, 2005). The Joint Calibration Procedure The joint calibration procedure develops the common vertical scale in a sinle step by calibratin all items from all rades simultaneously. Given G rades to link, the lo-likelihood has the followin form: G lo L= l, = 3 where l is the rade-specific (marinal) lo-likelihood iven by N l = lo L( i θ, ) f( θ)dθ i= z β, (5) where N is the number of students in rade, θ is the ability, f is its rade-dependent density function, β = ( β, K, βj) is the collection of item parameters, and zi = ( zi, K, zij ) is the row of the full response matrix Z correspondin to student i in rade. Usin the independence assumption, we see that the likelihood of row i is 5 American Institutes for Research
7 J L( z θ, β ) = p( z θ, β ). i ij j j= Note that Equation 5 contains an explicit model of the population distribution within a rade. This decomposition of the likelihood and the population distribution provide the framework for vertical linkin. The connections across rades are made by the sets of common items assined to students in adjacent rades. The NPMML proceeds by replacin f with an empirical vector of normalized weihts p = ( p, K, p ) on a prespecified collection of population parameters (quadrature points) θ = ( θ, K, θ ), resultin in the followin approximation of the rade-specific lo-likelihood: N l l = lo L( zi θq, β) pq, i= q= with the constraint that pq =, for all. (6) q= In order to identify the model, it is sufficient to fix the mean proficiency within a sinle rade. For simplicity, let us fix the lowest rade proficiency to 0; that is, let θ qpq = 0. (7) q= If we assume that the location of the quadrature points are fixed, rather than estimated, the task becomes findin the conditional maximum of l ( Z, β, p, K, pg) = l G = 3 subject to the constraints in Equations (6) and (7). Because we want the conditional maximum place, we use Larane multipliers to redefine the likelihood function l to include the constraints with the new estimable parameters: l( % Z, β, p, K, p, µλ,, K, λ )= l% + µ θp, 3 G 3 G q q3 = 3 q= G Note that J, the number of items, may also depend on rade. However, to avoid cumbersome formulae, we suppress any notation indicatin this dependence. 6 American Institutes for Research
8 where l% = l + λ pq. q= We use an extension of Bock and Aiken s (98) EM alorithm to implement the NPMML estimation (see Cohen et al., 2005). This calibration yields rade-specific population distributions. From these we can readily obtain an estimate of the population moments; for example, the first moment (in the rades in which it is not fixed) is iven by µ = θ p. Standard Error of Parameters q= q q In a simple random sample, the likelihood of the data is the product of the individual likelihoods across observations, estimated by takin los and summin across observations. When the observations are correlated, as in a clustered sample, the function is no loner a true likelihood function the joint likelihood of the observations is no loner the product of the likelihoods of each observation because they nelect the covariance amon observations. Psychometricians continue to use estimates based on this likelihood function, even thouh it does not accurately model the real-world process of interest. The score function constitutes an estimatin equation in the sense of Godambe (960) and Godambe and Thompson (984), and the parameters of that function continue to hold pramatic interest in operational testin prorams. The inverse of the would-be information matrix, however, no loner provides an acceptable approximation of the variance of those estimates (Binder, 983; Godambe & Thompson, 984). For that reason, we use a Taylor-series approximation of the standard error, based upon the work of Binder (983). To develop the approximate variance estimator, we bein by reparameterizin the likelihood function. There are, in eneral, two equivalent approaches to estimatin constrained maximum likelihood models. The first, which we mention above, is based on the constrained likelihood (by introducin Larane multipliers). The second, based on a reduced likelihood function, is obtained by eliminatin redundant parameters (Mislevy, 984). Followin Mislevy, we reparameterize to eliminate redundant parameters, usin the information from the constraints to calculate the eliminated parameters in the full model. More precisely, we reard the last two population mass parameters p and p as functions of the previous 2 (because there are two constraints): p aθ b = θ θ and p aθ b =, θ θ 7 American Institutes for Research
9 where a 2 2 = pq and b q= q= = θ p. q q Let us define the weihted score function as the first derivative of the marinal lo-likelihood with respect to the reduced set of parameters of the model red γ = (, ) = (, ( p, K, p 3, p 2) ) β p β, K nk red w ( γ) = ( β, p ) = γ red = γ klo ( zi θq, β) q k= i= q= W W D l D w L p, where wk ( k =, K, K) is the samplin weiht associated with cluster (or PSU, primary samplin unit) k, and n k is the size of cluster k (aain, for the sake of transparency we inore stratification). In our context, the equation W ( γ) = 0, ( γ =?) (8) provides an estimatin equation in the sense of Godambe and Thompson (984) by which we may obtain consistent estimates of the finite population variances usin the formulae of Binder (983). To see this, let us assume that γˆ is the solution of the estimatin equation (8) in the sample and γ is the solution based on the full finite population or the set of all possible populations of interest. Then in first order we have W ( γ ) 0 = W ( ˆ) γ = W ( γ ) + ( ˆ γ γ ) + R. γ From this we obtain and W ( γ ) ˆ γ γ = γ W ( γ ) Var( ˆ) γ = ( ˆ γ γ )( ˆ γ γ ) T W ( γ ) = γ W ( γ ) W ( γ ) T W ( γ ) γ. 8 American Institutes for Research
10 Introducin Ω (γ ) as a variance of W ( γ ) across observations and takin the expectation value over γ, we obtain the covariance matrix of the reduced set of parameters: γ W ( γ ) W ( γ ) red = Var( ˆ) γ = Ω( γ ) γ = ˆ γ. γ γ To estimate Ω ˆ ( ˆ γ ) of Ω ( γ ), we use the stratified, between-psu weihted estimator, which is iven by K ˆ K T Ω ( ˆ γ ) = ( k )( k ), K k = where nk = D w lo L( θ, ) p z β and k γ k i q q i= q= γ= ˆ γ K = k. K = k Standard Error of Moments When creatin a vertical scale, we are interested in the estimates of the first moment of the population distribution (fixin this moment in one of the rades to zero). The previous section p yields the reduced covariance matrix Σ of the population mass parameters as a submatrix of γ Σ red : Σ Σ γ. β 2 Σ red = p Σ 2 Σred red p To obtain the covariance matrix Σ of the full set of population mass parameters, we first p p p T compute the covariance matrix Σab of ( p, K, p 2, a, b) via Σ ab = DabΣ reddab, where D ab O =. L θ θ2 θ L 2 p p T Then, Σ = D Σ D with ab 9 American Institutes for Research
11 D I I 2 2 = D = 2 θ θ θ θ θ, θ θ θ θ θ Where, I 2 is the 2 dimensional identity matrix. Note that Finally, the moment covariance matrix D a p. 2 b = p p p T Σ M = MΣ M for any rade is calculated. Here, M θ θ2 L θ θ θ2 θ L =. M M M M θ θ2 L θ We note that this approach nelects the covariance amon moment estimates across rades. The impact of this simplification remains an open, empirical question. The Separate Calibration, or Chain-Linkin Procedure Unlike concurrent calibration, separate calibration estimates the parameters for each rade separately and then links them throuh the use of linkin items common to multiple rades. One of the rades becomes the base scale to which subsequent tests are linked; here, we use the lowest rade (say, rade 3). Usin the common items between the rade 3 base scale and the next rade (rade 4), we determine a transformation that puts the item parameters from rade 4 on the same scale as rade 3. This process of chain-linkin repeats until all rades are scaled to the rade 3 base scale. Vertical linkin, when performed via separate calibration, is a localized operation, consistin of pair-wise linkaes of consecutive rades that establish the vertical scale. Vertical Linkin Baseline: Grade 3. With rade 3 as our base, the vertically linked scale score L θ 3i for the i th student in rade 3 coincides with the i th student within-rade scaled proficiency L θ 3i, that is, θ3 i = θ3 i. Vertical Linkin of Grade 3 to Grade 4. In rade 4, θ 4i denotes the achievement of the i th student from the within-rade scalin of the rade 4 items. We link rade 4 to rade 3 throuh a set of linkin items, items that are common to both the rade 3 and rade 4 tests. If there are m 34 of these items, b (3)4 is the vector of Rasch difficulty estimates for these items when they are b ) scaled within the fourth-rade data and 3(4 is the vector of difficulty estimates for the same 0 American Institutes for Research
12 b(3)4 items when the fourth-rade data is calibrated. Let b3(4) and be the means of these parameter estimates. Then, standard Rasch practice links the rade 4 achievement scale to the rade 3 achievement scale via where θ = θ + B, L B34 = b(3)4 b3(4) is the linkin constant. Because the linkin constant is estimated from both a sample of students and a sample of test items (those selected for linkin), it is subject to samplin error from both sources. The error from the samplin of items arises because linkin items are not entirely exchaneable in the linkin process a different sample of items would yield a different linkae. Under the assumption that the samplin error is independent across items, the variance of the vector b(3)4 b3(4) of lenth m 34 should reflect both sources of error. This assumption, of course, is unlikely to be true, and the consequences of its violation remain an open, empirical question. The linkin constant is the averae over the vector b(3)4 b3(4), so we propose to approximate the standard error of the linkin constant by the standard error of this mean, Var B Var b b b b B m = ( (3)4 3(4) ) = ( (3)4, j 3(4), j 34), m34( m 34 ) j= ( ) ( ) Var ( B ) SE B = L L The variance of the mean of rade 4 students is (with µ 4 = θ4 and µ 4 = θ4 ): ( µ L 4 ) ( µ 4 ) ( 34 ) ( 34 ) Var = Var + Var B = Var B. The latter holds because the population means in separate calibration are fixed to zero; they are not estimated. Comparisons between the rades 3 and 4 on the vertical scale must contain the Var B. That is, variance component ( ) 34 Var ( µ L 4 µ 3) Var ( B34) =. American Institutes for Research
13 Vertically Linked Scale. Applyin the formulae of the previous section to all pairs of adjacent rades creates the vertically linked trait scale that includes all rades. This results in a series of linkin constants ( B34, B45, K, BG, G) with correspondin variances ( Var( B34), Var( B45),, Var( BG, G) ) K. As with concurrent calibration, when analyzin the ability shifts rades < ', we include the followin variance component: µ µ amon two arbitrary L L ' Var( µ µ ) = Var( B + ). L L ' h, h h< ' Aain, these formulae are approximate because they treat the samplin variance of the means and linkin constants as thouh they are independent across rades. 4. Simulation Study This section describes a simulation study desined to compare the accuracy and precision of each of the linkin methods under realistic data conditions and to evaluate the efficacy of the proposed standard error approximations. We base the study on the vertical linkin sample desin used to link the Ohio Achievement Tests from rades 3 8. In eneral, this desin includes approximately 25 schools and 0,000 students per rade and six linkin items shared between each pair of adjacent rades (desin chanes implemented after this study was completed increased the actual number of linkin items in adjacent rades). Realistic values of some of the parameters of the model of item response set forth in Section 2 (above) were simply not known. To obtain realistic values, we enerated data sets from 54 different data confiurations, and we selected the confiuration that most closely approximated the desin effects observed in real item responses from a similar (within rade) sample desin. For ease of exposition, we refer to the dataset that yielded the most realistic within-rade desineffects the most Ohio-like confiuration of parameters. Usin the most Ohio-like confiuration, we enerated 00 datasets and applied both linkin procedures to evaluate the ability of the procedures to recover the eneratin parameters; the ability of the procedures to accurately approximate the precision of the estimates; the precision of the estimates from each procedure. 2 American Institutes for Research
14 The within-rade data that we used to match the parameter confiurations cannot, of course, inform our choice of values for the correlations between the within-rade and vertical scale. Therefore, we enerate additional datasets, holdin all parameters constant except the correlation, which we vary from.70 to.98 to observe the impact of this factor on the accuracy and precision of estimates. Data Generation Details Similar to the Ohio desin, our data also span rades 3 throuh 8. A sinle linkin form per rade consists of 33 core items (39 for rades 3 and 8) and 2 linkin items per form (except for rades 3 and 8, which had only 6 linkin items). We clustered our data within 25 elementary schools (rades 3 5) and 25 middle schools (rades 6 8). As is the case in Ohio, not all schools contributed data for all rades. We enerated 40 total schools, but only 25 schools contributed scores for any one rade (see Table ). Our data consisted of approximately 0,000 students per rade, for a total of 60,000 observations in each data set. Table : Data structure of simulated data sets: Number of schools for each rade, 60,000 observations total. Grade Number of Schools X X 40 X X 40 X 40 X 45 X X X 40 X X 40 X X 40 X 40 X 45 X X X Total number per rade Generation of Item Responses The data were enerated accordin to the model outlined in Section 2, A model of item response. First, we enerated the vector of latent traits θ and Ψ as specified in Equation 4. For convenience, we scaled the stochastic terms to yield traits with unit variance. Next, we enerated the stochastic components of the item response function as in Equation 3, takin care to scale these components to yield e ikj with a standard deviation of approximately.7, to match the standard loistic curve of the Rasch model. The final step calculated the item responses. Table 2 summarizes the key parameters of those models. 3 American Institutes for Research
15 Table 2: Summary of key factors likely to influence vertical linkae. Factor The linear relationship ( β ) between θ, the radespecific trait, and Ψ, the vertical trait Annual rowth, a Variance of school effects on vertical trait 2 ( var( vk ) = σ v( k ) ) Variance of school effects on rade-specific trait 2 ( var( w ) = σ ) k w( k ) Comments Parameters of the latent traits In our datasets, this coefficient is also the correlation coefficient. This is the averae rowth on the vertical scale in a year. For our study, we simply take this as a constant. This is the school effect on the vertical trait. This rade-specific school effect compounds the school effect associated with the latent trait. Because w k compounds v k, it is assumed to be small. Stochastic parameters of item response Variance of the item-specific This item-specific school effect school effects compounds the impact of school 2 ( var( u jk ) = σ u( jk ) ) effects on the latent traits. Dependin on curricular differences, it could be substantial. Recall that this is part of a stochastic term with a standard deviation of.7, so the larest value represents about 22 percent of the variance. Likely Rane of Realistic Values Values Used in Simulated Data Sets ,.90, , ,.0025, ,.2500,.6400 The final column of the table presents the candidate values for each parameter in the simulations. From that, we see that we have 3 * 3 * 3 * 2 = 54 possible combinations. To select the most realistic values, we calculated the averae desin effect on estimates of the percentae of correct responses to each item from a real data set (rade 3 readin data, drawn from a similar sample desin) and compared that to the observed desin effects in the simulated data sets. From these we identified the confiuration that most closely matched the real data. The details of these confiurations are presented in Appendix A. The parameters of the most Ohio-like confiuration are presented in Table 3. 4 American Institutes for Research
16 Table 3: Parameters of the most Ohio-like confiuration. Parameter Value β.98 a.50 2 σ v(k ).0 2 σ w(k ) σ u( jk ).25 Recovery of Generatin Parameters, Precision, and Accuracy of the Standard Errors We created 00 datasets, usin the most Ohio-like confiuration to evaluate the accuracy, precision and effectiveness of the standard error estimators of the two linkin methods. Table 4 compares the results of these simulations. The estimates of both methods reveal a small bias, which increases as the estimates cross additional rades. This findin seems intuitive because the trait measured is a compromise between the vertical trait and the somewhat attenuated rade-specific traits. The final columns compare the empirical standard errors (the standard deviations of the estimates across 00 datasets). From these, we see that the joint calibration produces estimates that are slihtly more efficient than the separate calibration procedure. Aain, this is reassurin because the joint calibration brins more information to bear in estimatin each item parameter. The standard error estimates from the joint calibration very closely match the empirical standard errors, but the proposed standard error estimator for the chain linkin underestimates the standard errors by about 5 20 percent. Table 4: Joint and separate linkin constants with standard errors, over 00 replications. Grades True linkin constant EB ( ) Linkin Constant Separate calibration estimate B j Joint Calibration estimate sep Standard Error of the Estimate Separate calibration Joint calibration Observed Standard Deviation of the Estimate Separate calibration Joint calibration B SE( B ) SE( B ) SD( B ) SD( B ) In summary, the two procedures offer virtually identical point estimates when averaed over many data sets. The joint calibration procedure is somewhat more efficient, and the proposed standard error estimator for the joint calibration procedure provides a more accurate approximation than the standard error estimator that we proposed for the chain-linkin procedure. sep j sep j 5 American Institutes for Research
17 ( Linkin Rasch Scales Across Grades in Clustered Samples Correlation Between Vertical and Grade-Specific Trait Given that the two procedures provide very similar results, and that the standard error estimates are more accurate for the joint calibration procedure, we analyzed the effect of varyin the correlation by usin only the joint calibration procedure. Table 5 describes the effect of varyin the correlation between the vertical and rade-specific trait for five of the data sets confiured to the most Ohio-like specifications with ρ ranin from.70 to.98. As this correlation increases, the standard error decreases within each rade. When ρ is small, the standard errors of the linkin constants and root mean square error (rmse) are lare. No clear relationship appears between ρ and the observed bias in the estimated linkin constants. ρ Table 5: Linkin constant and Standard Error for different correlations between the vertical and rade specific trait. Grades ρ E B ) B j SE( B j ) Bias RMSE Note: Results shown in this table are from five data sets, each created to the same most-ohio-like specification scheme with only the correlation between the rade-specific and vertical trait varyin. 6 American Institutes for Research
18 5. Conclusion The main oal of the study was to document the performance of the two linkin methods by usin realistic data and to verify the estimator of the standard error of the vertical linkin constant. This study has found that: The two methods produce nearly identical results, even when the vertical linkin items and the main assessment items load on separate, correlated traits; Our proposed standard error estimator for the joint calibration procedure matches the empirical standard errors to within 3 percent under complex sample desins; The joint calibration procedure produces moderately more efficient estimates, with reduction in the standard errors of 4 0 percent; Our proposed standard error estimator for the separate estimation method underestimates the empirical standard errors by 0 5 percent; The precision of the estimates is affected by the correlation between the on-rade traits and the vertical trait, with lower correlations associated with much less precise estimates. Other than the item misfit introduced by the clusterin of error terms and multidimensionality, this study did not address the impact of item fit on the standard errors. In real data, we expect that item misfit will contribute to larer standard errors than those presented here. 7 American Institutes for Research
19 Appendix A: Data Confiurations and Most Ohio-Like Data Generation The data simulated for this study were enerated to closely resemble real test data from Ohio. We wanted to match the Ohio desin as closely as possible, while simplifyin the data structure to facilitate analysis by eliminatin stratification and constructed-response items and by eneratin only a sinle test form per rade. Our initial step in data eneration was to select several potential realistic values that the parameters of interest could take (see Table 2 in the body of the report) and then to enerate data sets representin every unique combination of these values. The resultin 54 data set confiurations are described in Table A-. To select from these 54 the data sets the confiuration that most closely matched real Ohio data, we computed the desin effects for the simulated data, and we selected the data sets that most closely resembled the real desin effects from the rade 3 readin test (DE = 3.645). As can be seen from Fiure A-, one set of data sets more closely resembles the Ohio root desin effect than the others (correspondin to the data sets where we set the standard deviation of the item specific school effects parameter equal to 0.5) and include the data sets numbered (2, 5, 8,, 44, 47, 50, 53) in Table A-. We call these the Ohio-like datasets. Also apparent from this fiure is that chanin the value of this parameter has the larest impact on root desin effect. Varyin the other parameters has much smaller impact. 8 American Institutes for Research
20 Fiure A-: Grade 3 desin effects for 54 data sets representin all unique combinations of parameters likely to influence linkin error Mean Root Desin Effect Dataset NOTE: The solid line represents the root desin effect observed in the Ohio operational rade 3 Readin data (Root mean desin effect =.89) From these 2 Ohio-like confiurations, we selected one (data set #38) that was very close to the true Ohio desin effect observed in the third-rade Readin test data. The specifications for the Most Ohio-like data set are provided in Table A- and Table 3 (in the body of the report). The simulations described in this report are based on 00 data sets enerated to these specifications. 9 American Institutes for Research
21 Table A-: Confiuration of oriinal 54 data sets. Data Set ID Correlation between θ and θ SD of school effects on vertical trait SD of school effect on rade specific trait SD of item specific school effects Annual rowth NOTE: The Ohio-like data sets are bolded and the most Ohio-like data set, #38, is shaded. 20 American Institutes for Research
22 References Binder, D. A. (983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 5, Cochran, W. G. (977). Samplin techniques (3rd ed.). New York: John Wiley & Sons. Cohen, J., Chan, T., Jian, T., & Seburn, M. (2005). Consistent estimation of Rasch Item Parameters and their standard errors under complex sample desins. Manuscript submitted for publication. Cronbach, L. J., & Furby, L. (970). How we should measure chane or should we? Psycholoical Bulletin, 74(), De Leeuw, J., & Verhelst, N. (986). Maximum likelihood estimation in eneralized Rasch models. Journal of Educational Statistics,, Glas, C. A. W. (2005). Structural item response. In Encyclopedia of social measurement (Vol. 3). London: Elsevier Ltd. Godambe, V. P. (960). An optimum property of reular maximum likelihood estimation. The Annals of Mathematical Statistics, 3(4), Godambe, V. P., & Thompson, M. E. (984) Robust estimation throuh estimatin equations. Biometrika, 7(), Hambleton, R. K., & Swaminathan, H. (985). Item Response Theory: Principles and applications. Boston: Kluwer-Nijhoff. Hanson, B. A., & Béuin, A. A. (2002). Obtainin a common scale for Item Response Theory item parameters usin separate versus concurrent estimation in the common-item equatin desin. Applied Psycholoical Measurement, 26(), Harris, D. J. (99). Effects of passae and item scramblin on equatin relationships. Applied Psych Measurement, 5(3), Kamata, A. (998). Some eneralizations of the Rasch model: An application of the hierarchical eneralized linear model. Unpublished doctoral dissertation, Michian State University, East Lansin. Kamata, A. (200). Item analysis by the Hierarchical Generalized Linear Model. Journal of Educational Measurement, 38(), Karkee, T., Lewis, D. M., Hoskens, M., & Yao, L. (2003). Separate versus concurrent calibration methods in vertical scalin. Paper presented at the annual meetin of the National Council on Measurement in Education, Chicao. 2 American Institutes for Research
23 Kim, S.-H., & Cohen, A. S. (998). A comparison of linkin and concurrent calibration under item response theory. Applied Psycholoical Measurement, 22(2), Kish, L. (965). Survey samplin. New York: John Wiley & Sons. Kolen, M. J. (98). Comparison of traditional and item response theory methods for equatin tests. Journal of Educational Measurement, 8,. Kolen, M. J., & Brennan, R. L. (2004). Test equatin, scalin and linkin: Methods and practices. New York: Spriner-Verla. Linn, R. L., & Slinde, J. A. (977). The determination of the sinificance of chane between pre and posttestin periods. Review of Educational Research, 47, Lord, F. M. (980). Application of item response theory to practical testin problems. Hillsdale, NJ: Erlbaum. Mislevy, R. J. (984). Estimatin latent distributions. Psychometrika, 49, Muraki, E., Hombo, C. M., & Lee, Y.-W. (2000). Equatin and linkin performance assessments. Applied Psycholoical Measurement, 24(4), Patz R. J., & Junker, B. W. (999). Application and extension of MCMC in IRT: Multiple item types, missin data, and rated response. Journal of Educational and Behavioral Statistics, 24(4), Peterson, N. S., Cook, L. L., & Stockin, M. L. (983). IRT versus conventional equatin methods: A comparative study of scale stability. Journal of Educational Statistics, 8(2), Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modellin. Psychometrika, 69(2), Rasch, G. (960). Probabilistic models for some intellience and attainment tests. Copenhaen: Denmarks Paedaoiske Institut. Roosa, D. R., Brandt, D., & Zimowski, M. (982). A rowth curve approach to the measurement of chane. Psycholoical Bulletin, 92, Sarndal, C. E., Swenson, B., & Wretman, J. (992). Model assisted survey samplin. New York: Spriner-Verla. Skas, G., & Lissitz, R. W. (988). IRT test equatin: Relevant issues and a review of recent research. Review of Educational Research, 56(4), Skrondal, A., & Rabe-Hesketh, S. (2003). Multilevel loistic reression for polytomous data and rankins. Psychometrika, 68(2), American Institutes for Research
The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated
More informationAsymptotic Behavior of a t Test Robust to Cluster Heterogeneity
Asymptotic Behavior of a t est Robust to Cluster Heteroeneity Andrew V. Carter Department of Statistics University of California, Santa Barbara Kevin. Schnepel and Doulas G. Steierwald Department of Economics
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February
More informationRobust Semiparametric Optimal Testing Procedure for Multiple Normal Means
Veterinary Dianostic and Production Animal Medicine Publications Veterinary Dianostic and Production Animal Medicine 01 Robust Semiparametric Optimal Testin Procedure for Multiple Normal Means Pen Liu
More informationSolution to the take home exam for ECON 3150/4150
Solution to the tae home exam for ECO 350/450 Jia Zhiyan and Jo Thori Lind April 2004 General comments Most of the copies we ot were quite ood, and it seems most of you have done a real effort on the problem
More informationPHY 133 Lab 1 - The Pendulum
3/20/2017 PHY 133 Lab 1 The Pendulum [Stony Brook Physics Laboratory Manuals] Stony Brook Physics Laboratory Manuals PHY 133 Lab 1 - The Pendulum The purpose of this lab is to measure the period of a simple
More informationLecture 5 Processing microarray data
Lecture 5 Processin microarray data (1)Transform the data into a scale suitable for analysis ()Remove the effects of systematic and obfuscatin sources of variation (3)Identify discrepant observations Preprocessin
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are
More informationMaking the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring
Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring
More informationMultidimensional Linking for Tests with Mixed Item Types
Journal of Educational Measurement Summer 2009, Vol. 46, No. 2, pp. 177 197 Multidimensional Linking for Tests with Mixed Item Types Lihua Yao 1 Defense Manpower Data Center Keith Boughton CTB/McGraw-Hill
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 41 A Comparative Study of Item Response Theory Item Calibration Methods for the Two Parameter Logistic Model Kyung
More informationEquating Tests Under The Nominal Response Model Frank B. Baker
Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric
More informationStat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010
Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture
More informationCausal Bayesian Networks
Causal Bayesian Networks () Ste7 (2) (3) Kss Fus3 Ste2 () Fiure : Simple Example While Bayesian networks should typically be viewed as acausal, it is possible to impose a causal interpretation on these
More informationPREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen
PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS by Mary A. Hansen B.S., Mathematics and Computer Science, California University of PA,
More informationGeneralized Least-Squares Regressions V: Multiple Variables
City University of New York (CUNY) CUNY Academic Works Publications Research Kinsborouh Community Collee -05 Generalized Least-Squares Reressions V: Multiple Variables Nataniel Greene CUNY Kinsborouh Community
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 31 Assessing Equating Results Based on First-order and Second-order Equity Eunjung Lee, Won-Chan Lee, Robert L. Brennan
More informationAdjustment of Sampling Locations in Rail-Geometry Datasets: Using Dynamic Programming and Nonlinear Filtering
Systems and Computers in Japan, Vol. 37, No. 1, 2006 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J87-D-II, No. 6, June 2004, pp. 1199 1207 Adjustment of Samplin Locations in Rail-Geometry
More informationn j u = (3) b u Then we select m j u as a cross product between n j u and û j to create an orthonormal basis: m j u = n j u û j (4)
4 A Position error covariance for sface feate points For each sface feate point j, we first compute the normal û j by usin 9 of the neihborin points to fit a plane In order to create a 3D error ellipsoid
More informationNing Wu Institute for Traffic Engineering Ruhr University Bochum, Germany Tel: ; Fax: ;
MODELLING THE IMPACT OF SIDE-STREET TRAFFIC VOLUME ON MAJOR- STREET GREEN TIME AT ISOLATED SEMI-ACTUATED INTERSECTIONS FOR SIGNAL COORDINATION DECISIONS Donmei Lin, Correspondin Author Center for Advanced
More informationMisconceptions about sinking and floating
pplyin Scientific Principles to Resolve Student Misconceptions by Yue Yin whether a bar of soap will sink or float. Then students are asked to observe (O) what happens. Finally, students are asked to explain
More informationLinearized optimal power flow
Linearized optimal power flow. Some introductory comments The advantae of the economic dispatch formulation to obtain minimum cost allocation of demand to the eneration units is that it is computationally
More informationCOMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS
COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Mayuko Kanada Simon IN PARTIAL
More informationMonte Carlo Simulations for Rasch Model Tests
Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,
More informationBlinder-Oaxaca Decomposition for Tobit Models
Blinder-Oaxaca Decomposition for Tobit Models Thomas K. Bauer, Mathias Sinnin To cite this version: Thomas K. Bauer, Mathias Sinnin. Blinder-Oaxaca Decomposition for Tobit Models. Applied Economics, Taylor
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationMulti-sample structural equation models with mean structures, with special emphasis on assessing measurement invariance in cross-national research
1 Multi-sample structural equation models with mean structures, with special emphasis on assessin measurement invariance in cross-national research Measurement invariance measurement invariance: whether
More informationConical Pendulum Linearization Analyses
European J of Physics Education Volume 7 Issue 3 309-70 Dean et al. Conical Pendulum inearization Analyses Kevin Dean Jyothi Mathew Physics Department he Petroleum Institute Abu Dhabi, PO Box 533 United
More informationA Simulation Study to Compare CAT Strategies for Cognitive Diagnosis
A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff
More informationRenormalization Group Theory
Chapter 16 Renormalization Group Theory In the previous chapter a procedure was developed where hiher order 2 n cycles were related to lower order cycles throuh a functional composition and rescalin procedure.
More informationAG DANK/BCS Meeting 2013 in London University College London, 8/9 November 2013
AG DANK/S Meetin 3 in London University ollee London 8/9 November 3 MDELS FR SIMULTANEUS LASSIFIATIN AND REDUTIN F THREE-WAY DATA Roberto Rocci University Tor erata Rome A eneral classification model:
More informationCorrelated Component Regression: A Prediction/Classification Methodology for Possibly Many Features
(Reprinted from the 2010 American Statistical Association Proceedins with Edits) Correlated Component Reression: A Prediction/Classification Methodoloy for Possibly Many Features Jay Maidson Statistical
More informationA Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University
A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory
More informationA Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions
A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.
More informationA Mathematical Model for the Fire-extinguishing Rocket Flight in a Turbulent Atmosphere
A Mathematical Model for the Fire-extinuishin Rocket Fliht in a Turbulent Atmosphere CRISTINA MIHAILESCU Electromecanica Ploiesti SA Soseaua Ploiesti-Tiroviste, Km 8 ROMANIA crismihailescu@yahoo.com http://www.elmec.ro
More informationLINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006
LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master
More informationThe Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm
The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm Margo G. H. Jansen University of Groningen Rasch s Poisson counts model is a latent trait model for the situation
More informationMatrix multiplication: a group-theoretic approach
CSG399: Gems of Theoretical Computer Science. Lec. 21-23. Mar. 27-Apr. 3, 2009. Instructor: Emanuele Viola Scribe: Ravi Sundaram Matrix multiplication: a roup-theoretic approach Given two n n matrices
More informationAnders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh
Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh
More informationExploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement
Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis
More informationDimensionality Assessment: Additional Methods
Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy
More informationChained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data
Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification
More informationAn Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin
Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University
More informationMODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION
= = = MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION Mauro Cettolo and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnoloica I-385 Povo, Trento, Italy ABSTRACT Robust acoustic
More informationAn EM Algorithm for the Student-t Cluster-Weighted Modeling
An EM Alorithm for the Student-t luster-weihted Modelin Salvatore Inrassia, Simona. Minotti, and Giuseppe Incarbone Abstract luster-weihted Modelin is a flexible statistical framework for modelin local
More informationAbility Metric Transformations
Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three
More informationItem Response Theory (IRT) Analysis of Item Sets
University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis
More informationBayesian Nonparametric Rasch Modeling: Methods and Software
Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar
More informationNonlinear Model Reduction of Differential Algebraic Equation (DAE) Systems
Nonlinear Model Reduction of Differential Alebraic Equation DAE Systems Chuili Sun and Jueren Hahn Department of Chemical Enineerin eas A&M University Collee Station X 77843-3 hahn@tamu.edu repared for
More informationStrong Interference and Spectrum Warfare
Stron Interference and Spectrum Warfare Otilia opescu and Christopher Rose WILAB Ruters University 73 Brett Rd., iscataway, J 8854-86 Email: {otilia,crose}@winlab.ruters.edu Dimitrie C. opescu Department
More informationExperiment 3 The Simple Pendulum
PHY191 Fall003 Experiment 3: The Simple Pendulum 10/7/004 Pae 1 Suested Readin for this lab Experiment 3 The Simple Pendulum Read Taylor chapter 5. (You can skip section 5.6.IV if you aren't comfortable
More informationBidirectional Clustering of Weights for Finding Succinct Multivariate Polynomials
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.5, May 28 85 Bidirectional Clusterin of Weihts for Findin Succinct Multivariate Polynomials Yusuke Tanahashi and Ryohei Nakano
More informationREAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca
REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research rinceton, NJ scott.rickard,radu.balan,justinian.rosca @scr.siemens.com ABSTRACT
More informationLesson 7: Item response theory models (part 2)
Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013
More informationStatistical and psychometric methods for measurement: Scale development and validation
Statistical and psychometric methods for measurement: Scale development and validation Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course Washington, DC. June 11,
More informationA multivariate multilevel model for the analysis of TIMMS & PIRLS data
A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo
More informationInvestigation of ternary systems
Investiation of ternary systems Introduction The three component or ternary systems raise not only interestin theoretical issues, but also have reat practical sinificance, such as metallury, plastic industry
More informationPhase Diagrams: construction and comparative statics
1 / 11 Phase Diarams: construction and comparative statics November 13, 215 Alecos Papadopoulos PhD Candidate Department of Economics, Athens University of Economics and Business papadopalex@aueb.r, https://alecospapadopoulos.wordpress.com
More informationIEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Intervention in Gene Reulatory Networks via a Stationary Mean-First-Passae-Time Control Policy Golnaz Vahedi, Student Member, IEEE, Babak Faryabi, Student
More informationEquating Subscores Using Total Scaled Scores as an Anchor
Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and
More informationOverview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications
Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture
More informationOn K-Means Cluster Preservation using Quantization Schemes
On K-Means Cluster Preservation usin Quantization Schemes Deepak S. Turaa Michail Vlachos Olivier Verscheure IBM T.J. Watson Research Center, Hawthorne, Y, USA IBM Zürich Research Laboratory, Switzerland
More informationGeneralized Distance Metric as a Robust Similarity Measure for Mobile Object Trajectories
Generalized Distance Metric as a Robust Similarity Measure for Mobile Object rajectories Garima Pathak, Sanjay Madria Department of Computer Science University Of Missouri-Rolla Missouri-6541, USA {madrias}@umr.edu
More informationIRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model
University of Iowa Iowa Research Online Theses and Dissertations Summer 2017 IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model Kyung Yong Kim University
More informationStudies on the effect of violations of local independence on scale in Rasch models: The Dichotomous Rasch model
Studies on the effect of violations of local independence on scale in Rasch models Studies on the effect of violations of local independence on scale in Rasch models: The Dichotomous Rasch model Ida Marais
More informationDecomposing compositional data: minimum chi-squared reduced-rank approximations on the simplex
Decomposin compositional data: minimum chi-squared reduced-rank approximations on the simplex Gert Jan Welte Department of Applied Earth Sciences Delft University of Technoloy PO Box 508 NL-600 GA Delft
More informationA Performance Comparison Study with Information Criteria for MaxEnt Distributions
A Performance Comparison Study with nformation Criteria for MaxEnt Distributions Ozer OZDEMR and Aslı KAYA Abstract n statistical modelin, the beinnin problem that has to be solved is the parameter estimation
More informationGrouped Effects Estimators in Fixed Effects Models
Grouped Effects Estimators in Fixed Effects Models C. Alan Bester and Christian B. Hansen April 2009 Abstract. We consider estimation of nonlinear panel data models with common and individual specific
More informationLesson 6: Reliability
Lesson 6: Reliability Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences NMST 570, December 12, 2017 Dec 19, 2017 1/35 Contents 1. Introduction
More information2 Bayesian Hierarchical Response Modeling
2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item
More informationWire antenna model of the vertical grounding electrode
Boundary Elements and Other Mesh Reduction Methods XXXV 13 Wire antenna model of the vertical roundin electrode D. Poljak & S. Sesnic University of Split, FESB, Split, Croatia Abstract A straiht wire antenna
More informationPairwise Parameter Estimation in Rasch Models
Pairwise Parameter Estimation in Rasch Models Aeilko H. Zwinderman University of Leiden Rasch model item parameters can be estimated consistently with a pseudo-likelihood method based on comparing responses
More informationAssessment of the MCNP-ACAB code system for burnup credit analyses. N. García-Herranz 1, O. Cabellos 1, J. Sanz 2
Assessment of the MCNP-ACAB code system for burnup credit analyses N. García-Herranz, O. Cabellos, J. Sanz 2 Departamento de Ineniería Nuclear, Universidad Politécnica de Madrid 2 Departamento de Ineniería
More informationA General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling
ohammad Khoshnevisan School o Accountin and inance riith University Australia Housila P. Sinh School o Studies in Statistics ikram University Ujjain - 56. P. India Sarjinder Sinh Departament o athematics
More informationOn the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit
On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison
More information6 Mole Concept. g mol. g mol. g mol ) + 1( g : mol ratios are the units of molar mass. It does not matter which unit is on the
What is a e? 6 Mole Concept The nature of chemistry is to chane one ecule into one or more new ecules in order to create new substances such as paints, fertilizers, food additives, medicines, etc. When
More informationExpanded Knowledge on Orifice Meter Response to Wet Gas Flows
32 nd International North Sea Flow Measurement Workshop 21-24 October 2014 Expanded Knowlede on Orifice Meter Response to Wet Gas Flows Richard Steven, Colorado Enineerin Experiment Station Inc Josh Kinney,
More informationA Probabilistic Analysis of Propositional STRIPS. Planning. Tom Bylander. Division of Mathematics, Computer Science, and Statistics
A Probabilistic Analysis of Propositional STRIPS Plannin Tom Bylander Division of Mathematics, Computer Science, and Statistics The University of Texas at San Antonio San Antonio, Texas 78249 USA bylander@riner.cs.utsa.edu
More informationContributions to latent variable modeling in educational measurement Zwitser, R.J.
UvA-DARE (Digital Academic Repository) Contributions to latent variable modeling in educational measurement Zwitser, R.J. Link to publication Citation for published version (APA): Zwitser, R. J. (2015).
More informationRESISTANCE STRAIN GAGES FILLAMENTS EFFECT
RESISTANCE STRAIN GAGES FILLAMENTS EFFECT Nashwan T. Younis, Younis@enr.ipfw.edu Department of Mechanical Enineerin, Indiana University-Purdue University Fort Wayne, USA Bonsu Kan, kan@enr.ipfw.edu Department
More informationarxiv: v1 [cs.ai] 15 Nov 2013
Inferrin Multilateral Relations from Dynamic Pairwise Interactions arxiv:1311.3982v1 [cs.ai] 15 Nov 2013 Aaron Schein, Juston Moore, Hanna Wallach School of Computer Science University of Massachusetts
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationA Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model
A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model Cees A. W. Glas, University of Twente, the Netherlands Juan Carlos Suárez Falcón, Universidad Nacional de Educacion a Distancia,
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationItem Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions
R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&
More information(a) Find the function that describes the fraction of light bulbs failing by time t e (0.1)x dx = [ e (0.1)x ] t 0 = 1 e (0.1)t.
1 M 13-Lecture March 8, 216 Contents: 1) Differential Equations 2) Unlimited Population Growth 3) Terminal velocity and stea states Voluntary Quiz: The probability density function of a liht bulb failin
More informationImproved transformation of ϕ-divergence goodness-of-fit test statistics based on minimum ϕ -divergence estimator for GLIM of binary data
SUT Journal of Mathematics Vol. 52, No. 2 (216), 193 214 Improved transformation of ϕ-diverence oodness-of-fit test statistics based on minimum ϕ -diverence estimator for GLIM of binary data Nobuhiro Taneichi,
More informationV DD. M 1 M 2 V i2. V o2 R 1 R 2 C C
UNVERSTY OF CALFORNA Collee of Enineerin Department of Electrical Enineerin and Computer Sciences E. Alon Homework #3 Solutions EECS 40 P. Nuzzo Use the EECS40 90nm CMOS process in all home works and projects
More informationStandard Test Method for Sulfur in Automotive Fuels by Polarization X-ray Fluorescence Spectrometry 1
Desination: D 7220 06 An American National Standard Standard Test Method for Sulfur in Automotive Fuels by Polarization X-ray Fluorescence Spectrometry 1 This standard is issued under the fixed desination
More informationHierarchical Linear Models. Jeff Gill. University of Florida
Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE
More informationConvergence of DFT eigenvalues with cell volume and vacuum level
Converence of DFT eienvalues with cell volume and vacuum level Sohrab Ismail-Beii October 4, 2013 Computin work functions or absolute DFT eienvalues (e.. ionization potentials) requires some care. Obviously,
More informationIRT Model Selection Methods for Polytomous Items
IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running
More informationRound-off Error Free Fixed-Point Design of Polynomial FIR Predictors
Round-off Error Free Fixed-Point Desin of Polynomial FIR Predictors Jarno. A. Tanskanen and Vassil S. Dimitrov Institute of Intellient Power Electronics Department of Electrical and Communications Enineerin
More informationAltitude measurement for model rocketry
Altitude measurement for model rocketry David A. Cauhey Sibley School of Mechanical Aerospace Enineerin, Cornell University, Ithaca, New York 14853 I. INTRODUCTION In his book, Rocket Boys, 1 Homer Hickam
More informationModeling rater effects using a combination of Generalizability Theory and IRT
Psychological Test and Assessment Modeling, Volume 60, 2018 (1), 53-80 Modeling rater effects using a combination of Generalizability Theory and IRT Jinnie Choi 1 & Mark R. Wilson 2 Abstract Motivated
More informationTeams to exploit spatial locality among agents
Teams to exploit spatial locality amon aents James Parker and Maria Gini Department of Computer Science and Enineerin, University of Minnesota Email: [jparker,ini]@cs.umn.edu Abstract In many situations,
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 37 Effects of the Number of Common Items on Equating Precision and Estimates of the Lower Bound to the Number of Common
More information