Item Selection in Computerized Classification Testing. Nathan A. Thompson. Prometric

Size: px

Start display at page:

Download "Item Selection in Computerized Classification Testing. Nathan A. Thompson. Prometric"

Judith Davis
5 years ago
Views:

1 RUNNING HEAD: Item Selecton n CCT Item Selecton n Computerzed Classfcaton Testng Nathan A. Thompson Prometrc Address: Nathan A. Thompson Prometrc 160 Energy Lane St. Paul, MN 55108

2 Abstract Several alternatves for tem selecton algorthms based on tem response theory (IRT) n computerzed classfcaton testng (CCT) have been suggested (Spray & Reckase, 1994; Eggen, 1999; Ln & Spray, 000; Wessman, 004) wth no conclusve evdence on the substantal superorty of a sngle method. It s argued that the lack of szable effect s due to the fact that some of the methods actually assess the same concept through dfferent calculatons, and the smplest method s therefore the most approprate. Moreover, the effcency of tem selecton methods depend on the termnaton crtera that s used, whch s demonstrated through ddactc example and monte carlo smulaton. Item selecton at the cutscore, whch seems conceptually approprate for CCT, s not always the most effcent opton.

3 Item Selecton n Computerzed Classfcaton Testng Computerzed admnstraton of psychoeducatonal tests offers many advantages. A well-documented advantage s the reduced number of tems requred per examnee when varablelength testng s appled. Varable-length testng refers to tests where not every examnee receves the same number of tems; f the purpose of the test has been satsfed after only a small number of tems, the test s concluded. The most common method of varable length testng, when t s used for ablty/trat (θ) estmaton, s computerzed adaptve testng (Wess & Kngsbury, 1984). When the purpose of the test s to assgn an examnee to two or more mutually exclusve categores along the θ contnuum, t s called computerzed classfcaton testng (CCT: Ln & Spray, 000). A major component wthn CCT that realzes the advantage of reduced test length s that of ntellgent tem selecton algorthms. Contrary to a fxed-form conventonal test, tems are selected throughout the test, ether n testlets (Luecht & Nungester, 1998) or after each ndvdual tem (Spray & Reckase, 1994; Ln & Spray, 000). The tem selecton process can be termed ntellgent because the computer attempts to evaluate whch tem remanng n the bank would be the best tem to admnster next, wthn certan constrants. Item selecton methods dffer n how they perform ths evaluaton. Several ntellgent tem selecton optons have been suggested based on tem response theory (IRT), ncludng the maxmzaton of tem nformaton at the current θ estmate (Reckase, 1983), the cutscore (Spray and Reckase, 1994; 1996), across a regon of θ (Eggen, 1999), a logodds rato (Ln & Spray, 000), and across θ (Wessman, 004). However, these tend to fall nto two general types: estmate-based (EB) and cutscore-based (CB). The other major component of a CCT that produces shorter tests s the applcaton of varable-length termnaton crtera. The termnaton crteron s an algorthm that determnes whether the examnee s able to be classfed wthn certan parameters at each pont n the test. When the examnee s able to be classfed, they are assgned a category and the test s termnated. There are three prmary termnaton crtera n CCT. The sequental probablty rato test (SPRT; Wald, 1947; Eggen, 1999) formulates the decson process as a hypothess test that the examnee s θ s equal to a specfed pont above the cutscore or another specfed pont below the cutscore. Ablty confdence ntervals (ACI; Thompson, 006), orgnally termed adaptve mastery testng (Kngsbury & Wess, 1983), termnate the test when a confdence nterval for the examnee s θ s completely above or below the cutscore. Lastly, loss/utlty structures from Bayesan decson theory can be used to assgn classfcatons (Rudner, 00). The decson theoretc approach often uses random tem selecton and classcal test theory, so t s not consdered here. The purpose of ths study s threefold: (1) Demonstrate that tem selecton methods can be generally classfed as EB or CB; () Demonstrate that, wthn the EB and CB paradgms, varous tem selecton methods assess the same concept; (3) Demonstrate that EB selecton s approprate for ACI, and CB selecton s approprate for the SPRT. Item Response Theory The thrd necessary component of a CCT s the adopton of a psychometrc model. Whle CCTs can be developed usng classcal test theory (Frck, 199), methods dscussed heren only make use of tem response theory (IRT; Hambleton & Swamnathan, 1985; Embretson & Rese, 000) for several reasons. Frst, tem banks for large-scale testng programs are often calbrated Item Selecton n CCT 1

4 wth IRT. Second, CCT methods that use classcal test theory assume that examnees can be neatly dvded nto categores such as masters and nonmasters before tem calbraton, whch s not always possble. The ACI crteron for CCT and the tem selecton methods dscussed n ths study requre that tems be calbrated wth IRT because IRT places tems and examnees on the same scale. Lastly, the majorty of research on CCT s based on IRT. Ths study assumed that the data can be effcently modeled wth the three-parameter logstc model. The probablty of an examnee wth a gven θ j correctly respondng to an tem s (Hambleton & Swamnathan, 1985, Eq. 3.3): P ( X exp[ Da ( θ b )] = 1 θ j ) = c + (1 c ), (1) 1+ exp[ Da ( θ b )] where a s the tem dscrmnaton parameter, b s the tem dffculty or locaton parameter, c s the lower asymptote, or pseudoguessng parameter, and D s a scalng constant equal to 1.70 or 1.0. D was assumed to be 1.0 n ths study to make the example calculatons smpler. Item Selecton Crtera Although CCT research dates back to the 1960s (Ferguson, 1969), the frst ntellgent tem selecton method used n CCT was the maxmzaton of Fsher nformaton (FI) at the current estmate of θ (Reckase, 1983; Kngsbury & Wess, 1983). FI s broadly defned as the condtonal slope squared dvded by the condtonal varance, gven the probablty of a correct response to tem P (θ), or (Embretson & Rese, 000, Eq. 7 A.1) I ( θ) = P '( θ) P ( θ )(1 P ( θ)). () The nformaton functon for the three-parameter model s specfcally defned as (Embretson & Rese, 000, Eq. 7 A.) 1 P ( θ ) ( P ( θ ) c ) I ( θ ) = a. (3) P ( θ ) (1 c ) FI, whle a functon of θ, s evaluated at a sngle pont for each tem. Reckase (1983) and Kngsbury & Wess (1983) chose the current θ estmate as ths pont. Spray and Reckase (1994) suggested that tems nstead be selected to maxmze Fsher nformaton at the cutscore, argung that ths makes more conceptual sense snce the goal of the test s only to determne f an examnee s above or below that pont. Eggen (1999) advocated the use of nformaton across a regon of θ. Ths regonal, rather than pont, type of nformaton s known as Kullback-Lebler nformaton (KLI). KLI can be descrbed as the expectaton over observed responses x of the log-lkelhood rato for each tem, or L ( θ ; x ) K ( θ θ1) = Eθ log 1 (4) L ( θ1; x ) Item Selecton n CCT

5 wth θ 1 and θ representng two ponts on θ chosen by the test user that defne the regon that KLI s calculated on and L x 1 x ( ; x ) = P ( θ )[ 1 P ( θ ) θ ] (5) denotng the lkelhood functon for the th tem. The double vertcal bars are used n ths context to emphasze that θ 1 and θ are separated, and not vewed as the condtonal relatonshp ndcated by a sngle vertcal bar. Wth a dchotomous IRT model (Eggen, 1999; Ln & Spray, 000), ths smplfes to P ( θ ) Q ( θ ) K ( θ θ1) = P ( θ )log + Q ( θ )log (6) P ( θ ) Q ( θ ) 1 1 where P (θ ) s the probablty of a correct response at θ and Q (θ ) s the complementary probablty of an ncorrect response. The values θ 1 and θ can be chosen above and below the cutscore or the current estmate. Ln and Spray (000) developed another tem selecton crteron, selectng tems by maxmzng the log of the rato of the tem response probabltes at θ 1 and θ, P ( θ ) R = P ( θ1) X Q ( θ ) Q ( θ1) 1 X. (7) Ln and Spray suggest that θ 1 and θ be chosen above and below the cutscore, but smlar to KLI, t s also possble for them to be specfed at an nterval above and below the current θ estmate. Whle the log calculatons performed by Ln and Spray were more nvolved, they were stll dependent on maxmzng the dfference P(θ ) P(θ 1 ). The most general tem selecton method, mutual nformaton (Wessman, 004), evaluates nformaton across all responses and all values n θ. It s equvalent to the KLI between the dstrbutons of θ and X. Ths s expressed as f ( x, θ ) I( X, Θ) = f ( x, θ )log. (8) x X θ Θ f ( x ) f ( θ ) Because the functon n ths stuaton s the probablty of a response, the numerator of the bracketed rato, the jont functon of the tem response and θ, s the tem response functon (IRF). The functon f(x ) s, n psychometrc notaton, P(X = 1), the classcal dffculty value. The functon f(θ) s the assumed dstrbuton of θ. Wessman proposed that ths expresson be evaluated at dscrete ponts on θ, such as a pont above and below a cutscore. The smulaton n that study nvolved multple cutscores, but f there s only one cutscore, mutual nformaton can be reduced to KLI. Conceptually, mutual nformaton then quantfes the dfference between the IRF and the classcal dffculty across θ at the dscrete ponts specfed. An tem wth very low a results n IRF values that dffer very lttle from the classcal dffculty across θ. An tem wth hgh a results n low IRF values wth low θ and hgh IRF values wth hgh θ, resultng n hgher mutual nformaton. What makes mutual nformaton dfferent from a mere transformaton of the a parameter s the applcaton of an assumed θ dstrbuton. Dscrmnaton values equal, an tem Item Selecton n CCT 3

6 wth a dffculty near the mean of the θ dstrbuton wll have greater mutual nformaton than an tem wth extreme dffculty, because t provdes ts nformaton to a greater proporton of examnees. One drawback to mutual nformaton s the assumpton of a pror. If nothng s known concernng the examnee dstrbuton, a unform pror may be approprate, analogous to the use of maxmum lkelhood θ estmaton rather than Bayesan θ estmaton procedures. If ths were to be appled, mutual nformaton would smply be a functon of a. Ths was the approach used by Wessman (004). Wessman (004) also suggested another new crteron for tem selecton n CCT, Fsher nformaton wth a weght functon of the posteror θ dstrbuton or lkelhood functon. Smlar to other methods, ths was evaluated at the current θ estmate by Wessman, but can also be evaluated at the cutscore. Ths method does not contrbute anythng more than FI because, at a gven pont such as the cutscore or current θ estmate, the FI for all remanng tems n the bank s beng multpled by the same value whatever s the weght for that θ value. Such multplcaton does not change the rankng of the tems n terms of nformaton. EB and CB Selecton, and Equvalence Wthn Category As evdent from the outlne of the evaluatve processes above, each method can be desgned to assess the tem wth regards to the cutscore or to the current θ estmate. For the three methods wth a constraned evaluatve locale (FI, KLI, log-odds rato), the evaluatve process takes place at ether a sngle pont or two endponts of a regon, and ether approach can be specfed to occur at the cutscore or θ estmate. Even mutual nformaton, whch evaluates nformaton across any number of ponts on θ, can be specfed wth ponts relatve to cutscore(s) or a θ estmate. For ths reason, tem selecton methods n CCT can be broadly categorzed as EB or CB. Ths categorzaton s outlned n Table 1. Moreover, the three constraned methods are essentally equvalent, and tend to select the same tem. All three assess the nformaton provded by an tem at the evaluatve locale, cutscore or estmate, and smply dfference n the calculaton used to perform the evaluaton. For example, suppose a CCT s desgned to use CB tem selecton at the cutscore θ c = 0.5. FI wll select the tem wth the hghest nformaton at 0.5, whch s the tem wth the greatest a, and b nearest to 0.5. Ths same tem wll also provde the hghest average nformaton across a regon around 0.5. Addtonally, an tem wth these characterstcs wll also produce the greatest dfference P(θ ) P(θ 1 ) for a θ 1 below the cutscore and θ above the cutscore, whch maxmzes Ln and Spray s (000) crteron. Consder the followng example wth three tems, shown n Table. These three tems have smlar IRT parameters. Item 1 represents an approprate tem for admnstraton at θ c = 0.5; the locaton parameter matches exactly, and the dscrmnaton parameter s moderately hgh. Item s also approprate, but the locaton parameter s not exactly the same, whle tem 3 has the same locaton parameter as tem 1 but slghtly less dscrmnaton. Even though the tem parameter dfferences are qute small, the rankngs of the tems on the three tem selecton crtera are equvalent because they are assessng the same concept. Moreover, the values are proportonate wth each method. It s because of ths fact that very lttle dfference has been found n emprcal studes. Eggen (1999) compared several formulatons of FI and KLI, and found that the best formulatons of each method were approxmately equal n terms of ATL and PCC. Ln and Spray (000) found that the log-odds rato also performed smlarly. Wessman (004) found that mutual nformaton had hgher PCC but lower ATL than FI, but used EB FI rather than CB FI. As descrbed below, ths s an unfar comparson when the SPRT s the termnaton crteron, as t was n the study. Furthermore, the dfference was only Item Selecton n CCT 4

7 evdent for very short tests such as a maxmum test length of 10, n whch case EB FI selecton s neffcent because the test does not have a chance to obtan an accurate θ estmate (Chang & Yng, 1996). Even so, the dfference was only two to three tems n terms of average test length. In the nterests of parsmony, snce the three approaches produce the same rankng of tems, the method wth the least computatonal burden and specfcaton of parameters should be used when desgnng a CCT. Termnaton Crtera and Item Selecton As prevously mentoned, there are three termnaton crtera avalable for CCT, wth the SPRT and ACI the most commonly used n applcaton. ACI classfes an examnee by estmatng θ after each tem n the test and constructng a confdence nterval around the estmate θˆ usng the condtonal standard error of measurement (SEM), f maxmum lkelhood estmaton s appled, or the square root of the Bayesan posteror varance, f Bayesan estmaton s appled. The confdence nterval s represented mathematcally as ˆ θ z ( SEM ) < θ < ˆ θ + z ( SEM ). (9) α α where z α s the normal devate for a 1-α confdence nterval, such as 1.96 for a 95% nterval. If the confdence nterval falls completely above the cutscore, the examnee can be classfed as pass. If the confdence nterval falls completely below the cutscore, the examnee s classfed as fal. If the confdence nterval contans the cutscore, another tem s admnstered. The SPRT structures the decson as a hypothess test wth H 0 : θ = θ 1 and H 1 : θ = θ. The lkelhood of each hypothess s compared n the form of a lkelhood rato: L L( θ = θ u) = = 1 = n L( θ = θ1 u) n = 1 P ( θ ) P ( θ ) 1 X X Q ( θ ) Q ( θ ) 1 X 1 X 1. (10) Ths rato s then compared to two decson ponts A and B. Wald (1947) suggested the followng as approxmatons, assumng a nomnal Type I error rate of α and Type II rate of β: Lower decson pont = B = β/(1 α) (11) Upper decson pont = A = (1 β)/α. (1) If L > A, the examnee s classfed as above the cutscore and the test s termnated. If L < B, the examnee s classfed as below the cutscore. If B < L < A, another tem s admnstered. Note that θˆ s not nvolved. The SPRT was orgnally appled to CCT wth classcal test theory tem parameters (Ferguson, 1969). Reckase (1983) developed a procedure to apply tem response theory to the specfcaton of P and Q. Reckase suggested that two ponts θ 1 and θ be chosen on the θ metrc, where the value of θ 1 represents the lowest level that the test developer s wllng to pass, whle θ represents the hghest θ that the test developer s wllng to fal. The space between the two s called the ndfference regon, and s often specfed by addng and subtractng a user-defned value δ from the cutscore. For example, f the cutscore s 1.0 the test user could defne, accordng Item Selecton n CCT 5

8 to ther nterpretaton of the testng stuaton, a small ndfference regon wth δ = 0.1 (θ 1 = 0.9 and θ = 1.1) or a large ndfference regon wth δ = 1.0 (θ 1 = 0.0 and θ =.0). Ths ntroduces a certan amount of arbtrarness nto the procedure, whch s notable because the sze of δ has a drect effect on the performance of the termnaton crteron. Because an IRT tem response functon s strctly ncreasng, a large value of δ wll lead to a greater dsparty between P (θ 1 ) and P (θ ) f the tem was answered ncorrectly or Q (θ 1 ) and Q (θ ) f the tem was answered ncorrectly. Ths n turn causes the value of the lkelhood rato to depart from 1.0 wth fewer tems. The same reasonng apples to the effcency of tem selecton crtera. Because the SPRT operates wth regards to the cutscore only, whle ACI s constructed wth regards to θˆ, ther respectve nformaton needs dffer. Wth the SPRT, a decson s made more quckly when L s maxmzed or mnmzed by tems that maxmze the dfference P(θ) P(θ 1 ). As demonstrated prevously, ths s equvalent to selectng tems wth the greatest FI at the cutscore. Conversely, ACI makes a decson more quckly when an accurate estmate of θ s obtaned. Ths occurs when the condtonal SEM or Bayesan posteror varance s mnmzed. As the SEM s nversely related to nformaton at the current θ estmate (Embretson & Rese, 000), ACI makes a decson more quckly when nformaton s maxmzed at θˆ rather than the cutscore. Smulaton Study Ths nteracton between tem selecton method and termnaton crteron was evaluated wth a bref monte carlo smulaton. CCTs were smulated for 10,000 examnees that were randomly generated from a N(0,1) dstrbuton. The tem bank conssted of 400 dchotomously scored tems, wth a ~ N(1,0.), b ~ N(0,1), and c = 0.5. Because the SPRT ntroduces the addtonal arbtrarness n the parameter of IR wdth, the smulaton was completed for the two ACI condtons, and then the IR wdth systematcally vared untl an approxmately equvalent PCC was obtaned. Ths allows a more drect comparson between ACI and the SPRT n terms of ATL. The IR wdth for the CB condton was 0.40, and the wdth for the EB condton was The nteracton s evdent n the ATL for each condton, presented n Table 3. For the SPRT, CB tem selecton used 1.44 fewer tems on average. For ACI, CB tem selecton requred 9.66 more tems, on average, whle also havng slghtly less classfcaton accuracy. No tem exposure constrants were employed n ths smulaton, as past research s conclusve that ths smply decreases dfferences between competng methods of other CCT aspects, or has neglgble effects (Spray, Abdel-fattah, Huang, and Lau, 1997; Lau, 1998; Eggen, 1999; Eggen & Straetmans, 000; Ln & Spray, 000; Jao, Wang, & Lau, 004). Ths smulaton followed Ln and Spray (000), who specfcally dd not nclude constrants because they reduce the vsblty of comparsons among other varables. Dscusson Ths smulaton, as well as the prevous examples, demonstrates how a sngle tem selecton method s not the most effcent for all uses. Contrary to ntal percepton, cutoff tem selecton s not always approprate n varable-length CCT. Because of the way that the SPRT and ACI utlze nformaton dfferently n the classfcaton of examnees, the most approprate tem selecton method can vary. Specfcally, CB tem selecton s more effcent when the SPRT s the termnaton crteron, and EB tem selecton s more effcent when ATL s the termnaton crteron. Whle t may be ntally ntutve that CB tem selecton s more approprate for all two-classfcaton CCT because the test s only nterested n f the examnee s above or below the cutoff, ths s not true. Item Selecton n CCT 6

9 No prevous research has made an even comparson wth a complete crossng of tem selecton and termnaton crteron methods, but ths smulaton s supported by two earler smulaton studes. Eggen and Straetmans (000) found that ATL was two to three tems lower for EB selecton than CB selecton, wth ACI as the termnaton crteron. Spray and Reckase (1994) supported CB selecton wth the SPRT but found lttle dfference between CB and EB selecton n terms of ATL wth ACI. However, PCC was not nvestgated n that study. Also note that CB tem selecton entals greater exposure of tems wth locaton parameters near the cutscores. Item exposure s less of an ssue wth EB selecton; only the frst few tems tend to be the same for every examnee when no restrctons are mposed, whereas the entre set s the same for CB selecton. If tem exposure constrants were used n the current study, they would have a greater effect on ATL for CB selecton than EB selecton, for ths reason. Ths would lead to more smlar ATL for CB and EB selecton wth the SPRT, and ncrease the ATL advantage for EB selecton wth ACI. An evaluaton of the extent of ths effect offers a good target for future research, gven the wdespread use of exposure constrants n hghstakes testng. Addtonal ndependent varables also offer opportuntes for future research. Because of the dfferent nformaton needs of ACI and the SPRT, the shape of the tem bank functon must be approprate. Namely, there must be suffcent nformaton near the cutscore for CB selecton wth the SPRT, and there must be suffcent nformaton across θ for EB selecton wth ACI. Effcency would decrease wth an napproprate tem bank. Because tem selecton method has a drect effect on the effcency of the examnaton, the specfcaton of ths aspect of CCT as argued heren has drect sgnfcance for practtoners. Dependng on the remanng parameters of the CCT desgn, applcaton of an tem selecton algorthm that s more approprate for a gven termnaton crteron wll reduce the ATL for examnees, thereby modestly reducng test seat tme and the requred sze of the tem bank. Gven that ths s the ntent of varable-length CCT, proper desgn of CCTs that enhances ths effect s of practcal mportance. Conversely, correct specfcaton of tem selecton method ncreases the accuracy of decsons f test length s held constant. In hgh-stakes testng, accuracy s more mportant than effcency, as an examnee s lkely to be more upset wth a perceved msclassfcaton than wth beng admnstered more tems. Because the utlzaton of nformaton s mportant for maxmzng classfcaton accuracy, t should be gven substantal consderaton when a CCT s desgned. Item Selecton n CCT 7

10 References Chang, H.-H., & Yng, Z. (1996). A global nformaton approach to computerzed adaptve testng.appled Psychologcal Measurement, 0, Eggen, T.J.H.M. (1999). Item selecton n adaptve testng wth the sequental probablty rato test. Appled Psychologcal Measurement, 3, Eggen, T.J.H.M., & Straetmans, G. J. J. M. (000). Computerzed adaptve testng for classfyng examnees nto three categores. Educatonal and Psychologcal Measurement, 60, Embretson, S.E., & Rese, S.P. (000). Item response theory for psychologsts. Mahwah, NJ: Erlbaum. Ferguson, R. L. (1969). The development, mplementaton, and evaluaton of a computer-asssted branched test for a program of ndvdually prescrbed nstructon. Unpublshed doctoral dssertaton, Unversty of Pttsburgh. Frck, T.W. (199). Computerzed adaptve mastery tests as expert systems. Journal of Educatonal Computng Research, 8(), Hambleton, R. K., & Swamnathan, H. (1985). Item response theory: prncples and applcatons. Norwell, MA: Kluwer Academc Publshers. Jao, H., Wang, S., & Lau, C. A. (004). An Investgaton of Two Combnaton Procedures of SPRT for Three-category Classfcaton Decsons n Computerzed Classfcaton Test. Paper presented at the annual meetng of the Amercan Educatonal Research Assocaton, San Antono, Aprl 004. Kngsbury, G.G. & Wess, D.J. (1983). A comparson of IRT-based adaptve mastery testng and a sequental mastery testng procedure. In D.J. Wess (Ed.), New horzons n testng: Latent trat test theory and computerzed adaptve testng (pp ). New York: Academc Press. Lau, C. A. (1998). Robustness of a undmensonal computerzed testng mastery procedure wth multdmensonal testng data. Unpublshed doctoral dssertaton, Unversty of Iowa, Iowa Cty IA. Luecht, R.M., & Nungester, R.J. (1998). Some practcal examples of computer-adaptve sequental testng. Journal of Educatonal Measurement, 35, Ln, C.-J. & Spray, J.A. (000). Effects of tem-selecton crtera on classfcaton testng wth the sequental probablty rato test. (Research Report 000-8). Iowa Cty, IA: ACT, Inc. Reckase, M. D. (1983). A procedure for decson makng usng talored testng. In D. J. Wess (Ed.), New horzons n testng: Latent trat theory and computerzed adaptve testng (pp ). New York: Academc Press. Rudner, L.M. (00, Aprl). An examnaton of decson-theory adaptve testng procedures. Paper presented at the annual meetng of the Amercan Educatonal Research Assocaton, New Orleans, LA. Item Selecton n CCT 8

11 Spray, J. A., Abdel-fattah, A. A., Huang, C., and Lau, C. A. (1997). Undmensonal approxmatons for a computerzed test when the tem pool and latent space are multdmensonal (Research Report 97-5). Iowa Cty, Iowa: ACT, Inc. Spray, J.A., & Reckase, M.D. (1994, Aprl). The selecton of test tems for decson makng wth a computer adaptve test. Paper presented at the natonal meetng of the Natonal Councl on Measurement n Educaton, New Orleans LA. Spray, J.A., & Reckase, M.D.(1996). Comparson of SPRT and sequental Bayes procedures for classfyng examnees nto two categores usng a computerzed test. Journal of Educatonal and Behavoral Statstcs, 1, Thompson, N.A. (006). Varable-length computerzed classfcaton testng wth tem response theory. CLEAR Exam Revew, 17(), Wald, A. (1947). Sequental analyss. New York: Wley. Wess, D. J., & Kngsbury, G. G. (1984). Applcaton of computerzed adaptve testng to educatonal problems. Journal of Educatonal Measurement, 1, Wessman, A. (004). Mutual nformaton tem selecton n multple-category classfcaton CAT. Paper presented at the annual meetng of the Natonal Councl on Measurement n Educaton, San Dego CA. Item Selecton n CCT 9

12 Table 1: Classfcaton of tem selecton methods Approach Pont (FI) Regon (KLI) Estmate-Based Cutscore-Based 1. Maxmum FI at current estmate (Reckase, 1983) 1. Maxmum FI at cutscore(s) (Spray & Reckase, 1994) 1. KLI or MI n regon around current estmate (not used yet). Log-odds rato around current estmate (not used yet) 1. KLI or MI n regon around cutscore(s) (Eggen, 1999; Wessman, 004). Log-odds rato (Ln & Spray, 000) Item Selecton n CCT 10

13 Table : Example calculatons Item a b c P(θ = 0.0) P(θ = 0.5) P(θ = 1.0) FI KLI P(θ = 1.0) - P(θ = 0.0) Item Selecton n CCT 11

14 Table 3: ATL results from smulaton Termnaton Crteron Item Selecton ATL PCC SPRT CB EB ACI CB EB Item Selecton n CCT 1

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general