Rasch , 40 (9) : ,,, ,,,, B A cta Psychologica S in ica DO I: /SP. J

Size: px

Start display at page:

Download "Rasch , 40 (9) : ,,, ,,,, B A cta Psychologica S in ica DO I: /SP. J"

Derek Norman
5 years ago
Views:

1 2008, 40 (9) : A cta Psychologica S in ica DO I: /SP. J Rasch ( 1,,100875) ( 2 Kennedy School of Government, Harvard University, MA 02138, USA) Rasch, 66,,,, Facets Facets,,,Facets, Facets, ; ;Rasch B ,,, [ 6 ] [ 1 ] [ 5 ],,,, [ 7 ] [ 2 ] 1. 2 :,,, [ 3 ], W agner : :, : [ 5 ],,,,,, [ 4 ] : :, E2mail: sunxiaom in@ bnu. edu. cn, :

2 9 :Rasch 1031,,,,,,,,, (Classical Test Theory, CTT) ( Generalizability Theory, GT) CTT,,, CTT,,, CTT, Kendall, CTT,, KendallCTT GT CTT,,, GTCTT, CTT,, GT, ( Item Response Theory, IRT), CTT GT, IRT, IRT, IRT, () CTT, IRT, IRT, IRT, IRT RaschRasch Rasch log P n i 1 - Pn i = B n - D i P n i n i 1 - P n i n i B n n ( n = 1, 2,, N ) D i i ( i = 1, 2,, L ) Rasch,,, Rasch,,,,, L inacre [ 28 ] Rasch(Many Facets Rasch Model, MFRM ) MFRM Rasch, Rasch MFRM : log P n ijk P n ij( k - 1) = B n - D i - C j - F k P n ijk n i j k P n ij( k - 1) n i j k - 1 B n n( n = 1, 2,, N ) D i i( i = 1, 2,, L ) C j j( j = 1, 2,, J ) F k ( Partial Credit Model) k - 1k ( step difficul2 ty), K ( k = 1, 2,, K) logits Logits,

3 , MFRM, [ 10 ],MFRM MFRM, MFRM [ 11 ] [ ] [ ] [ 20 ] [ 21 ] [ ], [ 25 ],,, MFRM,, CTT, MFRM, [ 26 ], John M. L inacre MFRM Facets Facets (Unconditional Maximum L ikelihood) MFRM Facets forw indows [ 27 ] 1. 3 Facets , ;,, MFRM,, ,, A B C D E F G 34, A E H I J K L 32, : IRT MFRM, Facets forw indows M FRM Facets , ,, 1 23,,, [ 29 ] logits, logits, ( S E = 112), 53, logits(se = 0. 22) ; 37, logits(s E = 0. 13) 4 infit ( Infit MnSq), MFRM, fit,, MFRM,,, fit fit,mfrm, fit

4 9 :Rasch 1033, fit fit [ 13 ] [ 14 ] infit 112,,, infit 0. 8, 1,3 infit 2. 28, 1. 20, 1 66 S. E. InfitMnSq S. E. InfitMnSq S. E. InfitMnSq : RM SE (Model) : 0. 18, Adj SD: Separation: 6. 75, Separation Reliability: RM SE Root Mean - Square Standard Error, ( 1 3 )RM SE M SE (Mean2Square Standard Error),M S E Adj. SD Adj. SD ( ) M SE Separation Adj. SD RM S E,, Separation , Separation 3. 0 [ 30 ], Separa2 tion6. 75 Separation Reliability, [ 26 ],, 0;,1 1, Separation Reliability 0. 98, 2,, 2 (65) = , p < 0. 01,

5 , :,, MFRM ( logits) 2, 2 Facets MFRM,, Facets2 MFRM ( logits) MFRM , 1, 66 ; 2 ; 3 ; 4 Facets; 5 ; 6 3 5,

6 9 :Rasch ,, 56,, 16, Facets, 31, 15 25, 56 56,Facets, 3. 3 Facets, 56 7, 56, logits ModelS. E. A E H I J K L ; 2 ; 3 ; 4 ; 5 3, 56 7, 5, 5,, 56, 66, 12 Facets 3. 4Facets Facets, 4 4, 1 ; 2,,, A E ; 4 ; , , , t = , p = , Facets for W indows ,, infit

7 Facets [ 26 ],, infit, Infit, MFRM, IRT CTT [ 31 ],, 56 56,,12 Facets :, Facets :1, 2,,, , 3: 567, 5, 5 :, 56 ;, 5, Facets,, 56, (1) 1 1, 1, 2 12 :, logits, , 4 2, 21. 6, 66,, 1 3,4

8 9 :Rasch 1037,,,,, , 2 1, 2,, :,,,,,,,, MFRM, 5 MFRM, 66,: (1)MFRM,, ( infit ) (2),, Facets Facets,, Facets, MFRM,,, MFRM 1 Maurer T J, Solamon J M. The science and p ractice of a structured emp loyment interview coaching p rogram. 2006, 59 (2) : Personnel Psychology, 2 W iesnerw H, Cronshaw S F. A meta - analytic investigation of the impact of interview format and degree of structure on the validity of the emp loyment interview. Journal of Occupational Psychology, 1988, 61 (4) : Schm idt F L, Zimmerman R D. A counterintuitive hypothesis about emp loyment interview validity and some supporting evidence. Journal of App lied Psychology, 2004, 89 (3) : W alters L C, M illerm R, Ree M J. Structured interviews for p ilot selection: No incremental validity. Psychology, 1993, 3 (1) : 25 International Journal of Aviation 5 A rvey R D. The emp loyment interview: A summary and review of recent research. Personnel Psychology, 1982, 35 (2) : Schm itt N. Social and situational determ inants of interview decisions: Implications for the emp loyment interview. Personnel Psychology, 1976, 29 (1) : Maurer S D, Fay C. Effect of situational interviews, conventional structured interviews, and training on interview rating agreement: an experimental analysis. Personnel Psychology, 1988, 41 (2) : Sun X M, Zhang H C. A comparative study on methods used in estimating reliability of performance assessment: Methods based on correlation, percent of inter - rater reliability and Generalizability theory. Chinese) Psychological Science, 2005, 28 (3), ( in (,. - -., 2005, 28 (03) : ) 9 Shavelson R J, W ebb E A. Generability theory: A p rimer. Newbury Park, CA: SAGE Publications, Inc., Myford C M, Wolfe E W. Detecting and measuring rater effects usingmany - Facet Rasch Measurement: Part II. Journal of App lied Measurement, 2004, 5 (2) : LunzM E. Variation among exam iners and p rotocols on oral

9 exam inations. Paper p resented at the AnnualMeeting of the American Educational Research A ssociation, San Francisco, CA: L inacre J M. The calibration of essay graders. Paper p resented at the M idwest Objective Measurement Sem inar, Chicago, IL: Du Y. Raters and single p romp t - to - p romp t equating using the FACETS model in a writing performance assessment. Paper presented at the N inth International Objective Measurement Conference, Chicago, IL: Kondo - B rown K A. FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 2002: Yamauchi K. ComparingMany - facet Rasch Model and ANOVA model: Analysis of ratings of essays. Japanese Journal of Educational Psychology, 1999, 47 (3) : Myford C M. Looking for patterns in disagreements: A Facets analysis of human raters and e - rater s scores on essays written for the graduate management admission test ( GMAT). New O rleans, LA. : American Educational Research A ssociation, L inacre J M. An extension of the Rasch Model to multi - faceted situation. Chicago: University of Chicago, LunzM E, W right B D. Measuring the impact of judge severity on exam ination scores. App lied Measurement in Education, 1990, 3 (4) : A llen J M, Schumacker R E. Team assessment utilizing a Many - Facet Rasch Model. Journal of Outcome Measurement, 1998, 2 (2) : Looney M. A many - facet Rasch analysis of 1994 O lymp ic figure skating. Research Quarterly for Exercise & Sport, 1997, 68 (1) : Myford C M, M islevy R J. Monitoring and imp roving a portfolio assessment system, Center for Performance A ssessment Research Report. Princeton, NJ: Educational Testing Service, Chi E. Comparing holistic and analytic scoring for performance assessment with many - facet Rasch model. Journal of App lied Measurement, 2001, 2 (4) : Heller J I, Sheingold K, Myford C M. Reasoning about evidence in portfolios: Cognitive foundations for valid and reliable assessment. Educational A ssessment, 1998, 5 (1) : 5 24 LunzM E. Performance exam inations: Technology for analysis and standard setting. Paper presented at the AnnualMeeting of the National Council of Measurement in Education. Chicago, IL: Kline T L, Schm idt K M, Bowles R. U sing L inlog and FACETS to model item components in the LLTM. Journal of Applied Measurement, 2006, 7 (1) : Sun X M, Zhang H C. An IRT analysis of rater bias in structured interview of national civilian candidates. Acta Psychologica Sinica, 2006, 38 (4), ( in Chinese) (,. IRT., 2006, 38 (04) : ) 27 L inacre J M. Facets forw indows ed. Chicago, IL: MESA Press, L inacre J M, W right B D. Understand Rasch measurement: Construction of measures from Many - facet Data. Journal of App lied Measurement, 2002, 3 (4) : L inacre J M. Understanding Rasch measurement: Op tim izing rating scale category effectiveness. Journal of App lied Measurement, 2002, 3 (1) : LunzM E, W right B D, L inacre J M. Measuring the impact of judge severity on exam ination scores. App lied Measurement in Education, 1990, 3 (4) : L inacre J M, W right B D. Construction of measures from many - facet data. Journal of Applied Measurement, 2002, 3 (4) : A M any - faceted Ra sch M odel Ana lysis of Structured In terv iew SUN Xiao - M in 1, XUE Gang 2 ( 1 School of Psychology, B eijing Key Lab of Applied Experim ental Psychology, B eijing N orm al U niversity, B eijing China) ( 2 Kennedy School of Governm ent, Harvard U niversity, MA 02138, USA ) Abstract Being one of the most important techniques in personnel selection, structured interview has attracted more and more research interest in imp roving its reliability and validity. Some researches focus on the standardization of its content and dimensions, others try to decrease rater bias by intensive rater training. The third one, to handle the possible bias in statistic way, has attracted more and more attention. Many - faceted Rasch Model (MFRM ), an extension to Rasch model, served as such kind of techniques. By parameterizing not only interviewee s ability and item difficulty but also judge severity, MFRM offers an effective way to estimate interviewee s latent trait, that is, the ability, and p rovides detailed information of inter - rater reliability as far as a specific interviewee is concerned. This study used MFRM to analyze the result of a structured interview and demonstrated a creative way to locate the source of bias for a specific

10 9 :Rasch 1039 interviewee. Data came from a structured interview. There were 7 raters in each interview panel. Since the interview last two days, these 21 raters were random ized into 3 panels in the morning of each day in order to p revent cheating. Rating scores of two panels were used in this study. A B C D E F G were raters of one panel and interviewed interviewees numbered A E H I J K L were raters in the other panel that interviewees numbered were interviewed. rated each interviewee independently on five dimensions using a 10 points rating scale. Each rater U sing Facets , a computer p rogram based on MFRM, the abilities of 66 interviewees was estimated, accompanied with a Infit MnSq, which demonstrated the degree to which raters in the panel agreed with each other on the evaluation of a specific interviewee. The ranking order based on interview raw scores and Facets estimated logits values were compared. D ifference were found between those them. To track the source of error for interviewee numbered 56, bias analysis of Facets was also made. The ability of 66 interviewees were reported with infit MnSq, showing inter - rater reliability at individual level; The ranking order based on interview raw score and estimated ability score were quite different, especially for some interviews. Taking interviewee numbered 56 for examp le, the ranking difference was as large as 15. B ias analysis aimed at locating the source of error for interviewee number 56 show that not only rater consistency, but also rater severity contribute to the ranking difference. The results confirmed the utility of MFRM analysis. The app lication of MFRM in the analysis of structured interview was p roved to be not only an effective way in personnel selection, but also p rovided diagnostic information for sources of error locating at individual level. Key words structured interview; IRT; many2faceted Rasch Model

Beyond classical statistics: Different approaches to evaluating marking reliability

Beyond classical statistics: Different approaches to evaluating marking reliability E. Harrison, W. Pointer, Y. Bimpeh & B. Smith What do we mean by marking reliability? Reliability = when something can