Item Response Theory (IRT) an introduction. Norman Verhelst Eurometrics Tiel, The Netherlands

Size: px

Start display at page:

Download "Item Response Theory (IRT) an introduction. Norman Verhelst Eurometrics Tiel, The Netherlands"

Jerome Garrison
5 years ago
Views:

1 Item Response Theory (IRT) an introduction Norman Verhelst Eurometrics Tiel, The Netherlands

2 Installation of the program Copy the folder OPLM from the USB stick Example: c:\oplm Double-click the program OpSetEnv.exe choose c:\oplm This course: copy the folder Course IRT to your hard disk If you don t have the program TEXTPAD, Go to the folder TEXTPAD on the USB stick Double Click on the unique file

3 General Overview Item Response Theory (IRT) Basic Concepts A prototype: the Rasch model Classification of models Detailed study of one model: OPLM

4 What is a theory? A theory is a narrative about the observable world Using concepts Clarifying the relation between concepts and observable phenomena Imposing restrictions on possible phenomena falsifiable Is Classical Test Theory a theory in this sense?

5 Guttman s scalogram The targeted concept is represented by a line (continuous variable) (for example, an attitude, a competence) Persons and items are represented by points on the line The observable responses reflect the ordinal relationship between the points person and item on the underlying line. This line is nonobservable or latent

7 i j k m #items possible allowed n 2 n n+1

8 Guttman items probability(+) 1 0,5 A Guttman item Not-decreasing Not continuous ( step function ) The model is deterministic 0 latent variable

9 A bit of algebra: exponentiation 3 3 = 3 = Definition: 3 = = = 3 = = 3 = is the basis (positive) 3 : 5 is the exponent (arbitrary number) 1

10 The number e A famous number which deserves its own name. The basis of the natural logarithms e = The exponentiel function (see exp.xls): y y = = e x exp( x)

11 The logistic function f( x) x e = 1 + e x f( x) exp( x) = 1 + exp( x) ''the exp of something ( x) divided by one plus the exp of the same somethi ng' '

12 The Rasch model (1) Definition of the item response function: f i ( θ) = P( X = 1 θ) i β i is a non-specified number (parameter) f i ( θ ) exp( θ β ) = i 1 + exp( θ β ) i

13 A probabilistic model probability (+) 1 0,5 A trace line Increasing everywhere Continuous smooth Located at zero Item response function theta 0 < f ( θ) < 1 i f ( β ) = i i 1 2

14 Two Rasch items 1 Two Rasch items probability (+) 0, theta

15 Define Two equivalent forms ξ = exp( θ) and ε = exp( β ) such that i exp( θ β ) = exp( θ) exp( β ) = ξε and i i i exp( θ βi) ξεi = 1+ exp( θ β ) 1+ ξε i i left: additive form ( β is increasing with difficulty) right: multiplicative form ( ε is decreasing with difficulty) i

16 Our tasks What do we have? A narrative (hypothesis) A binary table with realisations of X vi (0 ou 1) What do we have to do? Estimate the parameters Check the narrative Use the test: go and measure

17 The program package OPLM (W)OPIN: specification (W)OPPRE: pre-analysis (must) (W)OPCML: estimation (CML) WOPPLOT (OPDRAW): graphical check

18 Incomplete designs: linked items items gr. 1 gr. 1 gr. 2 gr. 2 gr. 3 gr. 3

19 Incomplete designs: not linked items items gr. 1 gr. 1 gr. 2 gr. 2 gr. 3 gr. 3

20 Balanced Incomplete Block Designs (BIBD) Items are organized in a number of blocks Every student takes a subset of the blocks Every subset contains the same number of blocks. All blocks appear once in every serial position All pairs of blocks appear once

21 Possible BIBDs Total number of blocks (= number of booklets) Number of blocks per student

22 BIBD with 3 blocks booklet Position 1 Position 2 1 A B 2 B C 3 C A

23 BIBD with 7 blocks booklet Pos. 1 Pos. 2 Pos. 3 1 A B D 2 B C E 3 C D F 4 D E G 5 E F A 6 F G B 7 G A C

24 booklet Pos. 1 Pos. 2 Pos. 3 Pos. 4 1 A B D J 2 B C E K 3 C D F L 4 D E G M 5 E F H A 6 F G I B 7 G H J C 8 H I K D 9 I J L E 10 J K M F 11 K L A G 12 L M B H 13 M A C I

25 Exercise (1): a simple analysis using the Rasch model The data file Text file (be careful with Word ) expanded or compressed (= default) Booklet indication (1, 2, ) All data are numeric (no decimal points or commas) Missing data are treated as errors, except if not administered by design. The (arbitrary) name of the data file follows the DOS rules: 8+3 No spaces See: Example.dat Booklet 1: items 1-31 (31) Booklet 2: items 2-30 (29) Use the module WOPIN (open the file example1.scr)

26 Expanded or Compressed Expanded Compressed Booklet 1: items 1-5 (5) Booklet 2: items 3-8 (6)

27 Legend: t = text : module : file b = binary red = optional Jobn.cml (t) Jobn.log (t) Jobn.cov (b) user (W)OPIN Jobn.scr (b) (W)OPPRE Jobn.pre (b) (W)OPCML Jobn.par (b) data(t) Wopplot screen, files

28 Classification of IRT models Dimensionality Unidimensional vs. Multidimensional (M. Reckase, 2009) Observed Variables Binary (dichotomous) Polytomous Ordinal polytomy Nominal polytomy (Bock, 1979) Continuous variables (e.g. time) Function type of IRF Ascending (increasing) Single peaked

29 Local (or conditional) stochastic independence (1) Class A X=1 X=0 Y= Y= Class B X=1 X=0 Y= Y= Classes A+B X=1 X=0 Y= Y=

30 Local (or conditional) stochastic independence (2) PX θ X = PX θ X ( = 1, = 1) ( = 1, = 0) PX ( = 1, X = 1 θ) = PX ( = 1 θ) PX ( = 1 θ) But, notice that in general PX ( = 1 X = 1) PX ( = 1 X = 0) (Why not?)

31 Maximum Likelihood Estimation Example: see the file probability.xls Tab sheet Likelihood Joint Maximum Likelihood (JML) θ et β jointly (at the same time) Conditional Maximum Likelihood (CML) Only βs are estimated Marginal Maximum Likelihood (MML) βs and the distribution of θ

32 Joint Maximum Likelihood Number of parameters: n + k n = number of students k = number of items Estimates of item parameters not consistent Correction formulae not OK with incomplete designs Programs: FACETS, WINSTEPS, BIGSTEPS, (CONQUEST: optional)

33 Joint Maximum Likelihood (JML) (1) Data: John: (1,0,1); Mary: (0,1,1) Problem: β1, β2, β3, θjohn, θmary Likelihood (Vraisemblance): L( β, β, β, θ, θ ) J M = f ( θ ) [1 f ( θ )] f ( θ ) 1 J 2 J 3 J [1 f ( θ )] f ( θ ) f ( θ ) 1 M 2 M 3 M

34 Joint Maximum Likelihood (JML) (2) Size of the problem: n + k The problem grows at the same rate as n (sample size) Result: the estimates are not consistent (définition) In the first phase of the analysis, the θ- values are a nuisance. How to get rid of them?

35 Conditional Maximum Likelihood (1) X 1 =1 X 1 =0 total X 2 = X 2 = total PX ( = 1) = 120/200= 0.6 PX ( = 1) = 150/200= PX ( = 1 S= 1) = 40 /( ) 1 PX ( = 1 S= 1) = 70 /( ) 2

36 Conditional Maximum Likelihood (2) The response probability, conditional on the total score, is independent of the latent variable θ. The great discovery of G. Rasch (1960) The CML-estimators are consistent CML-estimation is independent of the sample Representativeness is not necessary This does not imply that sampling is not important See the file probability.xls, tab sheets compute and CML. Programs: PML, OPLM

37 ξεi 1 PX ( i = 1 ξ) = et PX ( i = 0 ξ) = 1 + ξε 1 + ξε i i P(101 ξ) ξε 1 ξε ξ (1 + ξε ) (1 + ξε ) (1 + ξε ) D = = ε ε 1 3 P(101 ξ ) P(101 ξ, S = 2) = P(101 ξ) + P(110 ξ) + P(011 ξ) 2 ξ εε 1 3 D εε 1 3 = = ξ ξ ξ εε εε εε 2 3 εε 1 3+ εε εε 2 3 D D D

38 Marginal Maximum Likelihood (MML) Theθ -values are considered as a simple random sample from some population The θs are distributed according to some distribution rule For example: the normal distribution (two parameters: µ et σ 2 ) Size of the problem: k + 2 Warnings The model is enhanced (and more vulnerable) The estimation becomes more difficult if there is no simple random sample (which is almost always the case) Programs: BILOG, CONQUEST, MULTILOG, OPLM

39 Normalization: choosing the origin of the scale Technical restriction Necessary How? One parameter β i is fixed at zero, e.g., β 4 = 0 The average of the β-parameters is fixed to zero Default of OPCML The average of the population is fixed at zero (MML) Exercise (2): Recompute example1, fixing β 4 at zero Compare estimates and standard errors Now fix β 4 at minus one; predict estimates and standard errors

40 Justified comparisons but not: * θ = θ + β * i 5 = β + 5 and also i β β = β β * * i j i j θ θ = θ θ * * v w v w θ β = θ β * * i "The (estimated) difficulty of item 5 is 7.3; therefore, this item is very difficult." i

41 Number of free parameters JML: n + k - 1 CML: k - 1 MML 1 population: k p populations: k + 2p -1

42 Check our narrative Deterministic model: easy (scalogram) Probabilitstic models: statistical tests goodness of fit General tests or targeted tests? See the file probability.xls, tab sheet gof. Apply the module Wopplot to example1

43 Notation i indicates the targeted item (say, i = 2) p(i s) or p i s = the (observed) proportion of correct responses to item i, given that the (total) score équals s (simple count). π(i s) or π i s = the probability (or predicted proportion) of a correct response to item i, given that the (total) score equals s (a complicated function of the item parameters) n s : the number of repondents with score s (simple count)

44 The S i -tests S * i ( O E ) n ( p π ) = = E nπ (1 π ) k 1 2 k sj sj s is is s= 1 j= (0,1) sj s= 1 s i s i s Two problems: * 1. The null distribution de S i? - asympotic: chi-square? - degrees of freedom? 2. What if some (expected) frequencies are (very) small?

45 Les S i tests with grouped scores Groups of scores: G = {1,2,3}; G = (4,5,6};... S * i = r s G q 1 2 ns( pis π is ) n π (1 π ) q= 1 s is is s G q * To transform Si into Si is complicated 2 Asymptotically, Si is χ -distributed (if the hypthesis is true). degrees of freedom: r 1 2

46 Grouping rules (complete design) Number of groups: as big as possible, but not >8 Number of observations per group: 30 Expected frequencies (per group): Correct responses: 5 Incorrect responses: 5 same score same group scores in a group are contiguous [e.g., (6,7,8,9) but not (6,8,9)] Distribution: as uniform as possible The grouping is recomputed for each item

47 Grouping rules (incomplete design) Sort score x booklet combination according to probability correct See probability.xls, tab sheet DIF Sc bk bk ,61, 71, 62, 72, 81, 82, 92, 91,...

48 Technical Information(1) The file OPLM.sys Stores the most recent analysis Updated by (W)OPIN The file OPCML.def Contains the defaults for the iterations May be adapted

49 Technical Information (2) Contsruct example3.scr from scratch The total number of items is 31 The data file is example.dat The design is contained in example3.bk The item labels are contained in example3.lab

50 Go and Measure Estimate the item parameters Check the measurement model Reject (or adapt) the model Accept the model (and its parameters) Carry out the actual measurement: estimate θ (and its standard error) for each student ML WML (Warm)

51 The ML-estimator of θ Find the value of θ that maximizes the likelihood function. θ $ is a transformation of the score s ES ( θ = θ) $ = s obs θ $ does not exist for scores zero and maximum θ $ est conditionnally biased:

52 Conditional Bias One would wish that on the average the estimate of θ equals the true value of θ: E(θ $ θ) = θ Conditional bias: E(θ $ θ) θ (The difference, considered as a function of θ)

53 Conditional bias 0.3 Bias of Warm and ML estimators ML(5) ML(W) 0.0 WML bias theta

54 Weighted Maximum Likelihood ML: maximize L( θ; s) WML: maximize w( θ) L( θ; s) -WML exists for all possible scores -The bias disappears quasi completely in a region of θ, where the test has sufficient information -See the file 'exemple1.cml'

55 Statistics with $ θ Two level regression model: θ = µ + R β + R β + Gβ + u + ε u ε vj j ij 2 j : N(0, τ ) ij 2 : N(0, σ ) - Apply the 2-level regression analysis using the WML estimates as dependent variable. - The test consists of 40 items, reasonably well targeted. - See the file 'Simulation.xls'

56 IRT is not the ultimate salvation See the file trust.cml (text file) and possibly the file trust.par using Wopplot What would be your conclusion?

57 1 three Rasch items 0.8 probability possible theta values

58 1 three Rasch items 0.8 probability actual theta values

59 DIF: differential item functioning (1) Generic example: boys and girls The boys have a better result than the girls on the target item (marginal proposition) If this is DIF à conceptuel problem Theoretical concept: boys at a given level of ability (θ) have a higher (or lower)score (on the target item) than the girls (conditional proposition) Testable prediction : Boys with a given total score have an average score (on the target item) equal to the average score of the girls with the same total score (null hypothesis)

60 DIF (2) See Probability.xls tab sheet DIF Apply OPLM on example2.scr item 16 Decision to take Eliminate item 16 from the test? What about content validity?

61 The partial credit model (1) exp(0) PX ( i = 0 θ) = D exp(1 θ βi 1) PX ( i = 1 θ) = D exp(2 θ βi 1 βi2) PX ( i = 2 θ) = D D is the sum of all numerators exp( jθ β ) = = D = ig ( g= 0 i j θ), ( βi0 0) PX j

62 The partial credit model (2) 1 probability (0) (1) (2) 0 β i 1 β i 2 θ Category Response Functions

63 The partial credit model (3) 1 probability (0) (2) 0 β i 2 β i 1 Notice that βi < β (1) 2 i1 See the file 'PCM.xls' θ

64 Local Independence Indicate the capital of: 1 Albania A Boudapest 2 Bulgaria B Boukarest 3 Iceland C Reykjavik 4 Romania D Sofia E Tirana

65 PCM: example The European Survey on Language Competences (ESLC) Field Trial for English Reading Comprehension Complicated design: 36 booklets (test forms) The items are organized in tasks Suspicion: items from the same task are not conditionally independent The score is defined at the task level: The task has become the item The score is the number of correct responses on the questions of the task The model is the PCM See the file ER.par (WOPPLOT)

66 Discrimination (1) different discrimination; same difficulty 1 probability theta

67 Discrimination (2) 1 different discrimination and difficulty probability theta

68 Discrimination and difficulty exp[ ai( θ βi)] fi( θ) = P( X = i 1 θ) =, ( ai>0) 1 + exp[ a ( θ β )] i i Rasch model: a = a = L = a 1 2 k Metaphore: θ: How much do I own? f i β : What is the price of commodity i a: What is the unit of currency ( θ ): Probability of buying i

69 Equivalent models θ * = θ / c * * * * βi = βi / c ai( θ βi) = ai( θ βi ) * ai = c ai

70 The model OPLM The discriminations are Considered known (as a hypothesis) Integer valued (1, 2, ) If the discrimination is known, only one parameter per item is to be estimated Hence the name: One Parameter Logistic Model Big advantage: CML estimators do exist.

71 Integer valued discriminations x 2 x 3 x Geometric mean: = 1.16

72 The program (W)OPSUG Gives a suggestion about the discriminations. The user has to specify the geometric mean of the a s Depends a bit on the sample size Suggestion (for n = 1000): 2.5 à 3.5 Analyze the data in the file example4.dat First WOPIN Then WOPSUG

73 Thank you!

Comparison between conditional and marginal maximum likelihood for a class of item response models

(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia