Nonparametric Bounds Analysis for NEAT Equating Jorge González a, c a, b, c, Ernesto San Martín a Department of Statistics, P. Universidad Católica de Chile, Chile b The Economics School of Louvain, Université Catholique de Louvain, Belgium c LIES, Laboratorio Interdisciplinario de Estadística Social Seminario de Estadística Educacional Septiembre 1, 2017, PUC, Chile
Outline 1 The equating statistical problem 2 3 4
Model specification and general formulation Equating transformation General formulation of the equating problem Test forms X and Y are administrated to n x and n y test takers, respectively. The scores X and Y are assumed to be random variables. X and Y are defined on (score) sample spaces X and Y, respectively. x 1,..., x nx F X (x) and y 1,..., y ny F Y (y). Statistical problem Modeling the relationship between scores to make them comparable.
Model specification and general formulation Equating transformation General formulation of the equating problem Test forms X and Y are administrated to n x and n y test takers, respectively. The scores X and Y are assumed to be random variables. X and Y are defined on (score) sample spaces X and Y, respectively. x 1,..., x nx F X (x) and y 1,..., y ny F Y (y). Statistical problem Modeling the relationship between scores to make them comparable.
Model specification and general formulation Equating transformation General formulation: equating transformation Definition Let X and Y be two sample spaces. A function ϕ : X Y will be called an equating transformation. The equating transformation maps the scores on the scale of one test form into the scale of the other. The equating transformation is to be estimated by an estimator ϕ n based on samples x 1,..., x nx F X and y 1,..., y ny F Y, respectively.
Equipercentile transformation Model specification and general formulation Equating transformation Cumulative Distribution Function 0 1 p F(y) F(x) y x Scores y = ϕ(x) = F 1 Y (F X(x))
Equipercentile transformation Model specification and general formulation Equating transformation Cumulative Distribution Function 0 1 p F(y) F(x) y x Scores y = ϕ(x) = F 1 Y (F X(x))
design : the current view : an alternative view Population Sample X Y A P 1 Q 2 T = w P P + w Q Q f XT (x) =w P f XP (x) + w Q f XQ (x) f Y T (y) =w P f Y P (y) + w Q f Y Q (y). Additional assumptions are needed to estimate f XT (x) and f Y T (y)
design : the current view : an alternative view Population Sample X Y A P 1 Q 2 T = w P P + w Q Q f XT (x) =w P f XP (x) + w Q f XQ (x) f Y T (y) =w P f Y P (y) + w Q f Y Q (y). Additional assumptions are needed to estimate f XT (x) and f Y T (y)
design : the current view : an alternative view Population Sample X Y A P 1 Q 2 T = w P P + w Q Q f XT (x) =w P f XP (x) + w Q f XQ (x) f Y T (y) =w P f Y P (y) + w Q f Y Q (y). Additional assumptions are needed to estimate f XT (x) and f Y T (y)
design : the current view : an alternative view Population Sample X Y A P 1 Q 2 T = w P P + w Q Q f XT (x) =w P f XP (x) + w Q f XQ (x) f Y T (y) =w P f Y P (y) + w Q f Y Q (y). Additional assumptions are needed to estimate f XT (x) and f Y T (y)
design : the current view : an alternative view Most common assumption, f XP (x a) = f XQ (x a) and f Y P (y a) = f Y Q (y a) Using these assumptions we get f XT (x) =w P f XP (x) + w Q f XP (x)f AQ (a) f Y T (y) =w P f Y Q (y)f AP (a) + w Q f Y Q (y) a a and from here ϕ T (x) = F 1 Y T (F XT (x))
design : the current view : an alternative view Most common assumption, f XP (x a) = f XQ (x a) and f Y P (y a) = f Y Q (y a) Using these assumptions we get f XT (x) =w P f XP (x) + w Q f XP (x)f AQ (a) f Y T (y) =w P f Y Q (y)f AP (a) + w Q f Y Q (y) a a and from here ϕ T (x) = F 1 Y T (F XT (x))
: the current view : an alternative view Conditional score distributions with no assumptions Let Z denote the population group so that { 1, if test taker is administered X; Z = 0, if test taker is administered Y. Then, by the Law of Total Probability (LTP) P (X x A) =P (X x A, Z = 1)P (Z = 1 A)+ (1) P (X x A, Z = 0)P (Z = 0 A) P (X x A) is not identified.
: the current view : an alternative view Conditional score distributions with no assumptions Let Z denote the population group so that { 1, if test taker is administered X; Z = 0, if test taker is administered Y. Then, by the Law of Total Probability (LTP) P (X x A) =P (X x A, Z = 1)P (Z = 1 A)+ (1) P (X x A, Z = 0)P (Z = 0 A) P (X x A) is not identified.
: the current view : an alternative view Conditional score distributions with no assumptions Let Z denote the population group so that { 1, if test taker is administered X; Z = 0, if test taker is administered Y. Then, by the Law of Total Probability (LTP) P (X x A) =P (X x A, Z = 1)P (Z = 1 A)+ (1) P (X x A, Z = 0)P (Z = 0 A) P (X x A) is not identified.
: the current view : an alternative view Partially identified probability distributions From (1) and the fact that P (X x A, Z = 0) is bounded between 0 and 1, it follows that L x P (X x A) U x (2) L x = P (X x A, Z = 1)P (Z = 1 A) U x = P (X x A, Z = 1)P (Z = 1 A) + P (Z = 0 A) Analogously for Y we have L y P (Y y A) U y (3) L y = P (Y y A, Z = 0)P (Z = 0 A) U y = P (Y y A, Z = 0)P (Z = 0 A) + P (Z = 1 A)
: the current view : an alternative view Partially identified probability distributions From (1) and the fact that P (X x A, Z = 0) is bounded between 0 and 1, it follows that L x P (X x A) U x (2) L x = P (X x A, Z = 1)P (Z = 1 A) U x = P (X x A, Z = 1)P (Z = 1 A) + P (Z = 0 A) Analogously for Y we have L y P (Y y A) U y (3) L y = P (Y y A, Z = 0)P (Z = 0 A) U y = P (Y y A, Z = 0)P (Z = 0 A) + P (Z = 1 A)
: the current view : an alternative view Partially identified probability distributions From (1) and the fact that P (X x A, Z = 0) is bounded between 0 and 1, it follows that L x P (X x A) U x (2) L x = P (X x A, Z = 1)P (Z = 1 A) U x = P (X x A, Z = 1)P (Z = 1 A) + P (Z = 0 A) Analogously for Y we have L y P (Y y A) U y (3) L y = P (Y y A, Z = 0)P (Z = 0 A) U y = P (Y y A, Z = 0)P (Z = 0 A) + P (Z = 1 A)
Graphical illustration : the current view : an alternative view Bounds for P(X<x A=2) Pr 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 Score
Graphical illustration : the current view : an alternative view Bounds for P(X<x A=2) (black line) Bounds for P(Y<y A=2) (red line) Pr 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 Score
: the current view : an alternative view Target distributions with no assumptions Marginalizing over A, P (X x A) =P (X x A, Z = 1)P (Z = 1 A)+ (4) P (X x A, Z = 0)P (Z = 0 A) becomes P (X x) = P (X x Z = 1)P (Z = 1)+P (X x Z = 0)P (Z = 0) or F X (x) = w P F XP (x) + w Q F XQ (x) Problem: F XQ (x) = P (X x Z = 0) is still non identified!
: the current view : an alternative view Target distributions with no assumptions Marginalizing over A, P (X x A) =P (X x A, Z = 1)P (Z = 1 A)+ (4) P (X x A, Z = 0)P (Z = 0 A) becomes P (X x) = P (X x Z = 1)P (Z = 1)+P (X x Z = 0)P (Z = 0) or F X (x) = w P F XP (x) + w Q F XQ (x) Problem: F XQ (x) = P (X x Z = 0) is still non identified!
: the current view : an alternative view Target distributions with no assumptions Marginalizing over A, P (X x A) =P (X x A, Z = 1)P (Z = 1 A)+ (4) P (X x A, Z = 0)P (Z = 0 A) becomes P (X x) = P (X x Z = 1)P (Z = 1)+P (X x Z = 0)P (Z = 0) or F X (x) = w P F XP (x) + w Q F XQ (x) Problem: F XQ (x) = P (X x Z = 0) is still non identified!
Partially identified target distributions : the current view : an alternative view P (X x) can also be bounded. In fact, L x P (X x) U x, L x = P (X x Z = 1)P (Z = 1) U x = P (X x Z = 1)P (Z = 1) + P (Z = 0) How does F X (x) compare to F XT (x)?
Partially identified target distributions : the current view : an alternative view P (X x) can also be bounded. In fact, L x P (X x) U x, L x = P (X x Z = 1)P (Z = 1) U x = P (X x Z = 1)P (Z = 1) + P (Z = 0) How does F X (x) compare to F XT (x)?
Bounds illustrations : the current view : an alternative view Score F XT [L x, U x ] F Y T [L y, U y ] 0 0.100 [0.050 ; 0.550] 0.105 [0.040 ; 0.540] 1 0.250 [0.125 ; 0.625] 0.320 [0.140 ; 0.640] 2 0.500 [0.250 ; 0.750] 0.530 [0.250 ; 0.750] 3 0.750 [0.375 ; 0.875] 0.755 [0.375 ; 0.875] 4 0.900 [0.450 ; 0.950] 0.900 [0.450 ; 0.950] 5 1.000 [0.500 ; 1.000] 1.000 [0.500 ; 1.000]
Graphical illustration : the current view : an alternative view Pr 0.0 0.2 0.4 0.6 0.8 1.0 F_T(x) F_T(y) 0 1 2 3 4 5 Scores
Graphical illustration : the current view : an alternative view Pr 0.0 0.2 0.4 0.6 0.8 1.0 F_T(x) Bounds for F(x) F_T(y) Bounds for F(y) F_T(x) F_T(y) 0 1 2 3 4 5 Scores
Bounded quantiles : the current view : an alternative view Let α (0, 1) and q α (X). = inf{t : P (X t) > α}. Define the following quantiles: r α (X) =. inf{t : P (X t Z = 1)P (Z = 1) + P (Z = 0) > α} { } α P (Z = 0) = inf t : P (X t Z = 1) > P (Z = 1) and =q α (X Z = 1) s α (X) =. inf{t : P (X t Z = 1)P (Z = 1) > α} { } α = inf t : P (X t Z = 1) > P (Z = 1) =q α (X Z = 1)
Bounded quantiles : the current view : an alternative view In the NEAT design, the quantiles of the partially identified probabilities P (X t) and P (Y t) are also partially identified by the following intervals: (i) r α (X) q α (X) s α (X); (ii) r α (Y ) q α (Y ) s α (Y ). (5)
Bounded quantile equating : the current view : an alternative view Main idea α [r α (X), s α (X)] [r α (Y ), s α (Y )]... 0.1 [1 ; 3] [2 ; 4] 0.2 [2 ; 4] [3 ; 5]...
Informative bounded quantiles : the current view : an alternative view The lower and the upper bounds are not always informative: 0 α P (Z=0) P (Z=1) 1, for all α [P (Z = 0), 1]; 0 α P (Z=1) 1, for all α [0, P (Z = 1)].
Bounded equipercentile equating : the current view : an alternative view Given that then F 1 Y moreover and F XP (t) w P (X t) F XP (t) w + (1 w) (F X P (t) w) ϕ(t) F 1 (F X P (t) w + (1 w)) u Y (α) = inf{t : F Y (t) > F YQ (t) (1 w) > α} l Y (α) = inf{t : F YQ (t) (1 w) + w > F Y (t) > α} Y
Example Data from Kolen and Brennan (2014). Two 36-items test forms. Form X was administered to 1,655 examinees and form Y was administered to 1,638 examinees. Also, 12 out of the 36 items are common between both test forms (items 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, and 36). w P = 1 and w Q = 0
Graphical illustration Example Score distributions in P and Q 0.0 0.2 0.4 0.6 0.8 1.0 F_XP F_YQ 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions in P and Q 0.0 0.2 0.4 0.6 0.8 1.0 F_XP F_YQ F_XT (w1=1) F_YT (w2=0) 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions in P and Q 0.0 0.2 0.4 0.6 0.8 1.0 F_XP F_YQ F_XT (w1=0.7) F_YT (w2=0.3) 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions in P and Q 0.0 0.2 0.4 0.6 0.8 1.0 F_XP F_YQ F_XT (w1=0.5) F_YT (w2=0.5) 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions in P and Q 0.0 0.2 0.4 0.6 0.8 1.0 F_XP F_YQ F_XT (w1=0.3) F_YT (w2=0.7) 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions in P and Q 0.0 0.2 0.4 0.6 0.8 1.0 F_XP F_YQ F_XT (w1=0) F_YT (w2=1) 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions 0.0 0.2 0.4 0.6 0.8 1.0 F_XT (w1=1) F_YT (w2=0) 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Score distributions 0.0 0.2 0.4 0.6 0.8 1.0 F_XT (w1=1) F_YT (w2=0) Ux Lx Uy Ly 0 5 10 15 20 25 30 35 Scores
Graphical illustration Example Quantiles 5 10 15 20 25 30 35 r_x, s_x r_y, s_y 0.0 0.2 0.4 0.6 0.8 1.0 Alpha
Summary and discussion Future work The equality of conditional distributions assumption is actually an identification restriction. We offer an alternative to the conditional distributions restriction Partially identified probability distributions The derived bounds show that there is huge uncertainty about the probability distributions that are to be used for equating
Future work Future work Equiquantile vs equipercentile: which one? Incorporate additional information in order to obtain tighter bounds.
Future work Thank you for your attention! Jorge González Departament of Statistics, Faculty of Mathematics, Pontificia Universidad Católica de Chile Av. Vicuña Mackenna 4860, Macul, Santiago, Chile E-mail: jorge.gonzalez@mat.uc.cl www.mat.uc.cl/~jorge.gonzalez July 15-19 Santiago, Chile This research is supported by Grant Fondecyt 1150233