2.1.3 The Testing Problem and Neave s Step Method

Size: px

Start display at page:

Download "2.1.3 The Testing Problem and Neave s Step Method"

Rafe Jennings
5 years ago
Views:

1 we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical under the entire BLIM space Θ. Remarks 1 In the monograph by Doignon and Falmagne (1999), the general BLIM, described by the entire parameter space Θ, is tested for goodness of fit. The reader should note, given the preceding remarks, that under this null model the asymptotic chi-square approximation may not hold true in general. 2 I have the feeling, 6 even the restricted parameter space Θ does not fulfill all the Birch s regularity conditions. To be more concrete, in general, restrictions must be imposed on the BLIM to attain identifiability (be discussed later). 3 Though these aspects, we will use the fictitious example with corresponding concrete results discussed in Doignon and Falmagne (1999) for the general BLIM (i.e., for Θ). But, we will speak of the parameter space Θ rather than Θ. So, the results listed in the following should not be the correct ones when the space Θ is actually data analyzed. However, the example is fictitious, and the exposition is only meant for illustrating the principal concepts of the analysis The Testing Problem and Neave s Step Method How are the goodness-of-fit statistics X 2 and G 2 used to test the fit of the BLIM described by Θ? More precise, how can we test the null hypothesis H 0 : For an appropriate θ Θ, (ρ t (R)) R 2 Q = (ρ θ (R)) R 2 Q versus the (complementary) alternative hypothesis H 1 : There is no θ Θ for which (ρ t (R)) R 2 Q = (ρ θ (R)) R 2 Q? Following the general procedure of classical significance testing, we proceed in several steps: 7 1 Specify the justifiable distributional assumptions underlying the empirical phenomenon from which the data is observed. 2 Based on 1, formulate the empirical problem of interest in terms of H 0, H 1. 3 Based on 1 and 2, choose a test statistic, T (x) (x, the data), and, viewed as a random variable T (X), derive its probability distribution under H 0. 4 Based on 3, choose a critical region (or, rejection region) to the significance level α ]0, 1[ (e.g., α = 0.05). A critical region, C, should consist of values of T which sparsely point to H 0 but rather most strongly to H 1. 6 This should be investigated! 7 This is the step method proposed by Neave (1976). 39

2 Ideally, we should have 8 P H0(T C) α. 5 Based on 4, propose the decision rule. If T (x) C, we reject H 0 in favour of H 1. If T (x) C, we do not reject H 0 in favour of H 1 (and should not conclude with accepting H 0 in general) Step 1: Data and Multinomial Distribution The data, x, is constituted by the observed absolute counts of the response patterns R 2 Q. In other words, x = (N(R)) R 2 Q (see Table 1). We assume that different subjects give their response patterns independent of each other. The (unknown) true probability of occurence, ρ t (R), of any response pattern R 2 Q is assumed to stay constant across the N subjects and to be strictly larger than 0. Then, in the series of N examinees responding to the items, the probability of observing N(R) subjects giving the response pattern R 2 Q (R varying over entire 2 Q ) is given by (m := Q, the number of items) N! 2 m i=1 N(R)! 2 m i=1 ρ t (R) N(R). If we introduce random variables X R (R 2 Q ) that represent the respective observed absolute cell counts N(R), then we can consider x := (N(R)) R 2 Q as a realization of the random vector X := (X R ) R 2 Q. In particular, we can recap the previous probability by the notation P(X = x) := P(X = N( ),..., X Q = N(Q)) = N! 2 m i=1 N(R)! 2 m i=1 ρ t (R) N(R). Here, ρ t (R) > 0 for any R 2 Q, and R 2 Q ρ t (R) = 1. Further, N(R) N {0} with 0 N(R) N for any R 2 Q, and R 2 Q N(R) = N. In other words, X = (X R ) R 2 Q is a multinomial random vector. 9 Definition 2.1. Let t, n N, and let p = (p 1, p 2,..., p t ) R t with p i > 0 (1 i t) and t i=1 p i = 1. Then, a random vector X = (X 1, X 2,..., X t ) is called multinomial with parameters n and probability vector p if P(X = x) := P(X 1 = x 1, X 2 = x 2,..., X t = x t ) = n! t t i=1 x p x i i i! 8 For a test based on an asymptotic H 0 -distribution, this condition may only be satisfied approximately for (finite) large sample sizes. 9 Exercise 2.2 points out that the concept of a multinomial distribution generalizes the known concept of a binomial distribution. 40 i=1

3 for any x = (x 1, x 2,..., x t ) with x i N {0}, 0 x i n, and t i=1 x i = n. In this case, we briefly write X M t (n, p). Let us summarize. The distributional assumption we make is that the cell counts over the collection of all response patterns follow a multinomial probability distribution. More precise, X = (X R ) R 2 Q M 2 m(n, p = (ρ t (R)) R 2 Q) Step 2: Null and Alternative Hypotheses The empirical problem we aim at is to test the fit of the BLIM described by Θ. In other words, the empirical problem of interest is whether the (unknown) true cell probabilities, ρ t (R) (R 2 Q ), of the manifest multinomial distribution X = (X R ) R 2 Q (in 1) can be expressed/explained by this BLIM as an underlying latent psychological model. This concern is already formulated in terms of H 0 and H 1, H 0 : For an appropriate θ Θ, (ρ t (R)) R 2 Q = (ρ θ (R)) R 2 Q, H 1 : There is no θ Θ for which (ρ t (R)) R 2 Q = (ρ θ (R)) R 2 Q Step 3: Test Statistics X 2 and G 2 Based on the steps 1 and 2, let us first motivate the choice of the test statistic T (x), depending on the data x = (N(R)) R 2 Q, and then, viewing T as a random variable, derive its (asymptotic) probability distribution assuming H 0 holds Limiting χ 2 -Distribution of X 2 and G 2 As already indicated, we want to judge the appropriateness of a hypothesized model by making predictions using the model. For instance, Pearson s X 2 and the log-likelihood ratio statistic G 2 are based on the discrepancies between expected frequencies, Nρ θ (R), derived under a specific model θ Θ, and the observed absolute frequencies N(R) (R 2 Q ). However, in case of the null model, H 0, here, no specific model θ Θ (and thus (ρ θ (R)) R 2 Q) is specified. Assuming H 0 holds, the only thing we know is that the true specific model describing the response data is an element of the parameter space Θ. But its value is not known. In particular, we can not make predictions based on this specific model. This complication is resolved as follows. If we assume H 0 is true, i.e., the BLIM based on Θ holds, and we are left with selecting a most plausible vector in Θ, then it is reasonable to approximate, i.e., estimate, the true parameter vector θ t Θ by choosing a value, θ, of Θ which is most consistent with the observed data, in the sense, that it optimizes a certain measure of discrepancy quantifying the consistency between the observed data and the predictions 41

4 made under a specific model. If we concentrate on X 2 and G 2 (for θ Θ, x = (N(R)) R 2 Q the data, and N the sample size), X 2 (θ; x, N) := (N(R) Nρ θ (R)) 2, Nρ θ (R) R 2 Q G 2 (θ; x, N) := 2 ( )} N(R) {N(R) ln, Nρ θ (R) R 2 Q we could choose estimates, θ X 2 resp. θ G 2, optimizing X 2 resp. G 2, in the sense, that they satisfy the optimization problems X 2 ( θ X 2; x, N) = inf θ Θ X2 (θ; x, N), G 2 ( θ G 2; x, N) = inf θ Θ G2 (θ; x, N). Assuming H 0 holds, under Birch s regularity conditions (Birch, 1964) sketched in Section 2.2 these optimization problems have a (generalized) solution θ X 2 resp. θ G 2 (in a generalized parameter space containing the initial space Θ ). 10 These estimates θ X 2(x; N) and θ G 2(x; N) for the (unknown) true parameter vector θ t Θ depend on the data x. They become random variables θ X 2(X; N) and θ G 2(X; N) when replacing the data x with the random vector X (cp. step 1). These are called estimators for the (unknow) true parameter vector θ t Θ. If we suppose that θ X 2(x; N) and θ G 2(x; N) belong to the initial parameter space Θ, 11 we can calculate the expected frequencies, Nρ bθx 2 (R) resp. Nρ b θg 2 (R) (R 2Q ), and the discrepancies X 2 ( θ X 2; x, N) resp. G 2 ( θ G 2; x, N). (Intuitive motivation: It seems plausible that we may expect small values for these statistics in case the null model really holds. Larger values rather speak against the null model.) Indeed, we choose these test statistics in step 3 of Neave s step method: T X 2(x; N) := X 2 ( θ X 2; x, N), T G 2(x; N) := G 2 ( θ G 2; x, N). Next, in step 3, we require distributions for these test statistics to be known under the null model H 0. This is achieved as follows, the main result: This is the theory of generalized minimum distance estimation for multinomial models, e.g., discussed in Bishop et al. (1975) for maximum likelihood estimation, and in Read & Cressie (1988) for the general power-divergence family (see Section 2.2). The generalized parameter space is the closure of the initial parameter space, plus, including a point at infinity depending on whether the initial parameter space is bounded or not. 11 In large samples, under the conditions described above, the probability of having θ X 2(x; N) resp. θ G 2(x; N) not belonging to Θ goes to zero as N. 12 This result will be generalized to the power-divergence family of Read-Cressie statistics of which the two statistics X 2 and G 2 are special members. 42

5 Theorem 2.2 (Main Result). If H 0 holds, s < 2 Q 1 (s N, the number of (unknown) independent parameters of the model), and Birch s regularity conditions are satisfied, then both the statistics T X 2(X; N) and T G 2(X; N) have a limiting chi-square distribution with degrees of freedom df = (2 m 1) s. If we denote the chi-square distribution with d := (2 m 1) s degrees of freedom by χ 2 d, then this result can be recaped by T X 2(X; N), T G 2(X; N) Distr χ 2 d as N, whereupon the symbol Distr stands for convergence in distribution. 13 Proof. See Bishop et al. (1975), or, Read & Cressie (1988) Asymptotic Equivalence, Critical Values, and Finite Sample Approximations Remarks Let the prerequisites be as in Theorem 2.2!!! 1 For X 2 and G 2 have the same asymptotic distribution, χ 2 d, they are called asymptotically equivalent. Sometimes authors utter that both the statistics, X 2 ( θ X 2; X, N) and G 2 ( θ G 2; X, N), nearly yield the same value if N is large and the observed table of cell counts is not sparse. This seems obvious from X 2 ( θ X 2; X, N) = G 2 ( θ G 2; X, N) + o p (1), with o p (1) representing a stochastic sequence, i.e., a sequence of random variables, which converges in probability to the constant 0 as N. 14 Informally, if N is sufficiently large, then with high probability the two statistics will yield nearly the same value. Even more, it holds X 2 (y; X, N) = G 2 (y ; X, N) + o p (1) for any combination of y, y { θ X 2, θ G 2}. In other words, if N is sufficiently large, then with high probability any of these values will be nearly equal. This can be reviewed in Read & Cressie (1988). In particular, using Theorem 2.2, X 2 (y; X, N), G 2 (y ; X, N) Distr χ 2 d as N for any combination of y, y { θ X 2, θ G 2}. In other words, we can use each of the estimators θ X 2(X; N) and θ G 2(X; N) in any of the statistics X 2 (.; X, N) and G 2 (.; X, N) and obtain asymptotically the same limiting χ 2 d-distribution. Thus, 13 In Exercise 2.3, the concept of convergence in distribution is reviewed. 14 In Exercise 2.3, you have to review the concept of convergence in probability. 43

6 e.g., if only the maximum likelihood estimate θ G 2(x; N) is available, 15 we can use it in the computation of Pearson s X 2 (.; x, N), and then we could test the null model, H 0, using this statistic. All these remarks hold because, under the conditions of Theorem 2.2 by the way, these conditions are assumed for the entire Remarks paragraph here, the estimators θ X 2(X; N) and θ G 2(X; N) are best asymptotically normal (BAN; see Read & Cressie, 1988). In particular, θ X 2(X; N) = θ G 2(X; N) + o p (1). In other words, if N is sufficiently large, then with high probability both the estimators will yield nearly the same value. 2 Theorem 2.2 implies, for any c R, lim N P H 0 (T X 2(X; N) > c) = P(χ 2 d > c) = lim N P H 0 (T G 2(X; N) > c). This is because (without stint, consider only T X 2; c R), { } lim P H 0 (T X 2(X; N) > c) = lim 1 P H0 (T X 2(X; N) c) N N { } (i) = lim 1 F TX 2 (X;N)(c) N = lim 1 lim F T N N X 2 (X;N)(c) TH2.2 = 1 F χ 2 d (c) = 1 P(χ 2 d c) = P(χ 2 d > c). Ad (i). F TX 2 (X;N) : R [0, 1], x F TX 2 (X;N)(x) := P(T X 2(X; N) x) is the cumulative distribution function of random variable T X 2(X; N) (cp. Exercise 2.3). In particular, if χ 2 d,α denotes the upper critical value to the significance level α ]0, 1[, in other words, P(χ 2 d > χ2 d,α ) = α, then, for c := χ2 d,α, lim P H 0 (T X 2(X; N) > χ 2 d,α) = lim P H 0 (T G 2(X; N) > χ 2 d,α) N N = P(χ 2 d > χ 2 d,α) = α. Similar expressions hold for any combination of the estimators θ X 2(X; N) resp. θ G 2(X; N) and test statistics X 2 (.; X, N) resp. G 2 (.; X, N) (cp. 1). 15 As we will demonstrate later, maximizing the likelihood function, L(θ; x, N), is equivalent to minimizing the log-likelihood ratio statistic G 2 (θ; x, N) (θ Θ ; fixed data x). 44

7 3 Based on 2, for N sufficiently large, P H0 (T X 2(X; N) > χ 2 d,α) α, P H0 (T G 2(X; N) > χ 2 d,α) α. Similar expressions hold for any combination of the estimators θ X 2(X; N) resp. θ G 2(X; N) and test statistics X 2 (.; X, N) resp. G 2 (.; X, N) (cp. 1). 4 From a practical point of view, the question of How large is sufficiently large? is crucial for the applicability of these asymptotic results. For, in practice only finite sample sizes are available. Let us reflect a bit on this issue. For a finite-sample (i.e., N < ) approximation to be acceptable, N should be sufficiently large. Since the number of cells of the multinomial, i.e., the number of response patterns, 2 Q = 2 m, is fixed, this likely should indicate that the expected frequencies Nρ (R) resp. Nρ bθx b (R) (R 2Q 2 θg ) should be rather large. 2 In other words, informally, large expected frequencies for the response patterns seem to be a plausible necessity for a good finite-sample approximation. Indeed, it is known that the finite-sample χ 2 -approximation for X 2 relies on the expected frequencies in each cell of the multinomial being large. Cochran (1952, 1954) provides a complete bibliography of the early discussions regarding this point. In the early literature, there were a variety of recommendations, rules of thumb, regarding the minimum expected cell frequency required for the χ 2 - approximation to be reasonably accurate. Values for the minimum expected cell frequency from 1 to 20 were suggested, generally based on individual experiences. Good et al. (1970) provide an overview of the historical recommendations. With the advancement of computers, computer-intensive statistical methods have become available. Exact and simulation studies have contributed to the deeper understanding of the χ 2 -approximation for X 2 and G 2 in small samples. The interested reader is refered to Read & Cressie (1988) regarding these issues. They provide a very nice chapter on historical perspectives concerning the two classical goodness-of-fit measures X 2 and G 2. Finally, let us review two rules of thumb for judging on the appropriateness of the χ 2 -approximation in finite samples. A conservative rule of thumb is this. The χ 2 -approximation is accepted if, for any R 2 Q, the expected frequency, Nρ (R) resp. Nρ bθx b 2 θg (R), of response pattern R is greater than 5, i.e., 2 Nρ (R) (resp., Nρ bθx b 2 θg (R)) > 5. 2 If the number of response patterns (i.e., number of cells of the multinomial) is large, the observed data table (see Table 1) likely tends to be rather sparse, i.e., with small observed absolute cell frequencies. In this case, the previous criterion may not be satisfied. Then, a less demanding criterion is (see Fienberg, 1980), for any R 2 Q, the expected frequency, Nρ (R) resp. Nρ bθx b 2 θg (R), of response 2 45

8 pattern R is greater than or equal to 1, i.e., Nρ (R) (resp., Nρ bθx b 2 θg (R)) 1. 2 It should be noted that a general conclusion from a bunch of comparative studies involving X 2 and G 2 is that, under H 0, in finite samples, the limiting χ 2 -distribution approximates the exact (finite-sample) distribution of X 2 more closely than the exact distribution of G 2 (see Read & Cressie, 1988). A last word. What to do if the finite-sample χ 2 -approximation to X 2 and G 2 is considered poor? How might the approximation be improved? There are a variaty of suggestions available regarding this point. For instance, Doignon & Falmagne (1999) suggest the grouping of cells with low expected frequencies if the less-demanding criterion, minimum expected cell frequency 1, fails. Other suggestions, ranging from second-order terms, corrected χ 2 -approximations, moment corrections, adding positive constants to cells, matching tails, log-normal approximations et cetera. Read & Cressie (1988) provide an extensive list of references regarding this important point. 5 The criterion minimum expected cell frequency 1 (see 4) will be used in all the analyses of Section Parameter Estimates and Standard Example We return to the standard example enriched with the fictitious data in Table 1. In this example, the number, s, of (unknown) independent model parameters is (m := Q, the number of items) s (i) = ( H 1) + 2m = (9 1) = 18. Ad (i). This is the general formula for calculating the number of (unknown) independent model parameters of any general BLIM without further restrictions. Because of K H p(k) = 1, we only have H 1 independent state probabilities. Furthermore, for any item q Q, there are two item parameters β q, η q. In particular, s = 18 < 31 = = 2 m 1. Under the null model, H 0, i.e., the BLIM based on the parameter space Θ is correct, (we assume that) Birch s regularity conditions are satisfied (cp. Remarks, 2, on page 39). The generalized parameter space, Θ, containing the initial parameter space Θ, is given by the closure of Θ in R H +2m = R 19 ( H = 9, m = 5), i.e., Θ = [0, 1] 19. The (generalized) minimum chi-square estimate, θ X 2(x; N = 1, 000), obtained from optimizing Pearson s X 2 (θ; x, N = 1, 000) (in the sense of page 42) for the data, x = (N(R)) R 2 Q, in Table 1 is given in next Table The abbreviation prob. stands for probabilities. CE resp. LG stand for careless error resp. lucky guess. 46

9 Table 2 (Generalized) Minimum chi-square estimate θ X 2(x; N = 1, 000) CE prob. LG prob. State prob. State prob. β a = 0.17 η a = 0.00 p( ) = 0.05 p({a, b, c}) = 0.04 β b = 0.17 η b = 0.09 p({a}) = 0.11 p({a, b, d}) = 0.19 β c = 0.20 η c = 0.00 p({b}) = 0.08 p({a, b, c, d}) = 0.19 β d = 0.46 η d = 0.00 p({a, b}) = 0.00 p({a, b, c, e}) = 0.03 β e = 0.20 η e = 0.03 p(q) = 0.31 X 2 ( θ X 2; x, N = 1, 000) [:= inf θ Θ X 2 (θ; x, N = 1, 000)] = 14.7 Remarks 1 Note that θ X 2(x; N = 1, 000) Θ, but θ X 2(x; N = 1, 000) Θ. For instance, p({a, b}) = η a = 0 (i.e., boundary values). 2 From the computational point of view, the optimization problems on page 42 are by no means trivial. In general, solutions to these problems can not be obtained by analytical methods in closed-form expressions. Numeric optimization algorithms are rather used, and the solutions obtained are approximate. Refering to Doignon & Falmagne (1999), the (generalized) minimum chisquare estimate reported above is obtained using a conjugate gradient search algorithm, called PRAXIS 17, by Brent (1973), a modification of Powell s (1964) direction-set method. This algorithm allows for the optimization of a function in several variables without calculating derivatives. It is implemented as C-function by Gegenfurtner (1992). The C-code can be downloaded from the Web site praxis/.html. Other numeric methods, mainly for the computation of maximum likelihood estimates (in particular, the minimum log-likelihood ratio G 2 estimate), are given by iterative procedures, such as iterative proportional fitting (Goodman, 1974; Bishop et al., 1975; Fienberg, 1980;), Fisher s method of scoring (Rao, 1965), the general Expectation-Maximization (EM) algorithm (Dempster et al., 1977; McLachlan & Krishnan, 1997), and methods of the Newton-Raphson type (Haberman, 1974, 1978, 1979). Denteneer & Verbeek (1986) provide a series of efficiency comparisons for various implementations of these algorithms. The (generalized) minimum G 2 estimate, i.e., (generalized) maximum likelihood estimate, θ G 2(x; N = 1, 000), obtained from optimizing G 2 (θ; x, N = 1, 000) (see page 42) for the data, x = (N(R)) R 2 Q, in Table 1 is given in Table Table 3 (Generalized) Minimum G 2 /ML estimate θ G 2(x; N = 1, 000) 17 PRAXIS stands for PRincipal AXIS (Brent, 1973). 18 This is done using Brent s (1973) PRAXIS method (cp. previous Remarks, 2). ML stands for maximum likelihood. Other abbreviations are defined as in Table 2. 47

10 CE prob. LG prob. State prob. State prob. β a = 0.16 η a = 0.04 p( ) = 0.05 p({a, b, c}) = 0.08 β b = 0.16 η b = 0.10 p({a}) = 0.10 p({a, b, d}) = 0.15 β c = 0.19 η c = 0.00 p({b}) = 0.08 p({a, b, c, d}) = 0.16 β d = 0.29 η d = 0.00 p({a, b}) = 0.04 p({a, b, c, e}) = 0.10 β e = 0.14 η e = 0.02 p(q) = 0.21 X 2 ( θ G 2; x, N = 1, 000) [:= inf θ Θ G 2 (θ; x, N = 1, 000)] = 12.6 Summary In this standard example, the test statistics chosen in step 3 of Neave s step method take the values T X 2(x; N) := X 2 ( θ X 2; x, N) = 14.7, T G 2(x; N) := G 2 ( θ G 2; x, N) = Step 4: Critical Region and Standard Example Next, we deal with step 4 of Neave s step method (see page 39). Under the conditions described in Theorem 2.2, both the statistics T X 2(X; N) and T G 2(X; N) have a limiting chi-square distribution with degrees of freedom d = (2 m 1) s. In this example, under H 0, we have d = (2 5 1) 18 = 13. The choice of a critical region to significance level α ]0, 1[ (e.g., α = 0.05) is straightforward. As already mentioned, it seems plausible that we may expect small values for these statistics in case the null model really holds. Larger values rather speak against the null model. This can be quantified using the upper critical value, χ 2 13,α (d = 13), of the chi-square distribution χ 2 13 to the significance level α. Then, values of the test statistics T X 2(X; N) and T G 2(X; N) greater than χ 2 13,α are viewed as critical values pointing to H 1 rather than H 0. (In this way, we take account of the principle that a critical region should consist of values of a test statistic which sparsely point to H 0 but rather most strongly to H 1.) More precise, the critical/rejection region, C, to the significance level α is chosen to be the following interval in R: C := ]χ 2 13,α, + [. Then, for N sufficiently large, the probability of rejecting H 0, given H 0 is correct, 19 is quantified by the significance level α (cp. Remarks, 3, page 45), P H0 (T X 2(X; N) C) α, P H0 (T G 2(X; N) C) α. 19 This error is called error of the first type, or, Type I error. There is also the error of the second type, or, Type II error, committed by accepting H 0 when in fact H 0 is false. 48

11 2.1.8 Step 5: Decision Rule and Standard Example Finally, we come to the last step 5 of Neave s step method. The decision rule is this. If T X 2(x; N) resp. T G 2(x; N) is a value in C, then we reject H 0 in favour of H 1. Otherwise, i.e., T X 2(x; N) resp. T G 2(x; N) not in C, we do not reject H 0 in favour of H 1 (and should not conclude accepting H 0 in general). 20 For instance, if α = 0.05, C := ]χ 2 13,0.05, + [ = ]22.36, + [. The values T X 2(x; N) = 14.7 and T G 2(x; N) = 12.6 do not belong to the critical region C. Therefore, H 0 is not rejected based on either the test statistics. 2.2 Power-Divergence Family In this Section 2.2, we review the so-called power-divergence family of goodnessof-fit statistics introduced by Cressie & Read (1984). This family includes the traditional X 2 and G 2 statistics described in Section 2.1, and it contains various other goodness-of-fit statistics proposed in the literature (e.g., Freeman-Tukey statistic, modified log-likelihood ratio statistic, Neyman-modified X 2 statistic). The power-divergence family provides a unification of these various common goodness-of-fit statistics. Based on this unification, general results concerning the behavior, similarities, and differences of these statistics can be derived and valuable alternatives to them be suggested (see Read & Cressie, 1988). Throughout, in Section 2.2, we use the following notation Notation Multinomial Models Let X = (X 1, X 2,..., X t ) (t N) be a t-dimensional random vector with the multinomial distribution X M t (n, π), where n N, and π = (π 1, π 2,..., π t ) is the vector of true cell probabilities π i > 0 (1 i t) with t i=1 π = 1; see Definition 2.1. Thus, for any x = (x 1, x 2,..., x t ) with x i N {0}, 0 x i n, and t i=1 x i = n, the probability P(X = x) of the vector x of cell counts is Remarks P(X = x) := P(X 1 = x 1, X 2 = x 2,..., X t = x t ) = n! t t i=1 x π x i i. i! 20 A value T X 2(x; N) resp. T G 2(x; N) leading to a rejection is called significant. i=1 49

Analysis of Multinomial Response Data: a Measure for Evaluating Knowledge Structures

Analysis of Multinomial Response Data: a Measure for Evaluating Knowledge Structures Department of Psychology University of Graz Universitätsplatz 2/III A-8010 Graz, Austria (e-mail: ali.uenlue@uni-graz.at)