An Overview of Limited Information Goodness-of-Fit Testing in Multidimensional Contingency Tables

Similar documents
Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

GENLOG Multinomial Loglinear and Logit Models

Absorption Rate into a Small Sphere for a Diffusing Particle Confined in a Large Sphere

Goodness-of-fit for composite hypotheses.

Psychometric Methods: Theory into Practice Larry R. Price

4/18/2005. Statistical Learning Theory

A Multivariate Normal Law for Turing s Formulae

Surveillance Points in High Dimensional Spaces

Estimation of the Correlation Coefficient for a Bivariate Normal Distribution with Missing Data

Web-based Supplementary Materials for. Controlling False Discoveries in Multidimensional Directional Decisions, with

Central Coverage Bayes Prediction Intervals for the Generalized Pareto Distribution

Alternative Tests for the Poisson Distribution

LET a random variable x follows the two - parameter

Chem 453/544 Fall /08/03. Exam #1 Solutions

Hypothesis Test and Confidence Interval for the Negative Binomial Distribution via Coincidence: A Case for Rare Events

6 PROBABILITY GENERATING FUNCTIONS

Identification of the degradation of railway ballast under a concrete sleeper

APPLICATION OF MAC IN THE FREQUENCY DOMAIN

3.1 Random variables

Information Retrieval Advanced IR models. Luca Bondi

Bayesian Analysis of Topp-Leone Distribution under Different Loss Functions and Different Priors

Safety variations in steel designed using Eurocode 3

Aalborg Universitet. Load Estimation from Natural input Modal Analysis Aenlle, Manuel López; Brincker, Rune; Canteli, Alfonso Fernández

Likelihood vs. Information in Aligning Biopolymer Sequences. UCSD Technical Report CS Timothy L. Bailey

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

6 Matrix Concentration Bounds

Some technical details on confidence. intervals for LIFT measures in data. mining

Recent Advances in Chemical Engineering, Biochemistry and Computational Chemistry

THE IMPACT OF NONNORMALITY ON THE ASYMPTOTIC CONFIDENCE INTERVAL FOR AN EFFECT SIZE MEASURE IN MULTIPLE REGRESSION

A New Method of Estimation of Size-Biased Generalized Logarithmic Series Distribution

Concomitants of Multivariate Order Statistics With Application to Judgment Poststratification

Lead field theory and the spatial sensitivity of scalp EEG Thomas Ferree and Matthew Clay July 12, 2000

Topic 5. Mean separation: Multiple comparisons [ST&D Ch.8, except 8.3]

Introduction to Mathematical Statistics Robert V. Hogg Joeseph McKean Allen T. Craig Seventh Edition

CROSSTABS. Notation. Marginal and Cell Statistics

Research Design - - Topic 17 Multiple Regression & Multiple Correlation: Two Predictors 2009 R.C. Gardner, Ph.D.

MEASURES OF BLOCK DESIGN EFFICIENCY RECOVERING INTERBLOCK INFORMATION

arxiv: v2 [physics.data-an] 15 Jul 2015

Multiple Experts with Binary Features

7.2. Coulomb s Law. The Electric Force

F-IF Logistic Growth Model, Abstract Version

TESTING THE VALIDITY OF THE EXPONENTIAL MODEL BASED ON TYPE II CENSORED DATA USING TRANSFORMED SAMPLE DATA

1D2G - Numerical solution of the neutron diffusion equation

Multiple Criteria Secretary Problem: A New Approach

Dr.Samira Muhammad salh

ELASTIC ANALYSIS OF CIRCULAR SANDWICH PLATES WITH FGM FACE-SHEETS

A NEW VARIABLE STIFFNESS SPRING USING A PRESTRESSED MECHANISM

A Backward Identification Problem for an Axis-Symmetric Fractional Diffusion Equation

ASTR415: Problem Set #6

On the Poisson Approximation to the Negative Hypergeometric Distribution

THE INFLUENCE OF THE MAGNETIC NON-LINEARITY ON THE MAGNETOSTATIC SHIELDS DESIGN

Hydroelastic Analysis of a 1900 TEU Container Ship Using Finite Element and Boundary Element Methods

State tracking control for Takagi-Sugeno models

This is a very simple sampling mode, and this article propose an algorithm about how to recover x from y in this condition.

Empirical Prediction of Fitting Densities in Industrial Workrooms for Ray Tracing. 1 Introduction. 2 Ray Tracing using DRAYCUB

Contact impedance of grounded and capacitive electrodes

Gradient-based Neural Network for Online Solution of Lyapunov Matrix Equation with Li Activation Function

Chapter 3 Optical Systems with Annular Pupils

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Chapter 5 Force and Motion

Rotor Blade Performance Analysis with Blade Element Momentum Theory

2. The Munich chain ladder method

Mitscherlich s Law: Sum of two exponential Processes; Conclusions 2009, 1 st July

CALCULATING THE NUMBER OF TWIN PRIMES WITH SPECIFIED DISTANCE BETWEEN THEM BASED ON THE SIMPLEST PROBABILISTIC MODEL

FUSE Fusion Utility Sequence Estimator

Light Time Delay and Apparent Position

n 1 Cov(X,Y)= ( X i- X )( Y i-y ). N-1 i=1 * If variable X and variable Y tend to increase together, then c(x,y) > 0

CHAPTER 3. Section 1. Modeling Population Growth

C/CS/Phys C191 Shor s order (period) finding algorithm and factoring 11/12/14 Fall 2014 Lecture 22

Nuclear size corrections to the energy levels of single-electron atoms

To Feel a Force Chapter 7 Static equilibrium - torque and friction

Chapter 6 Balanced Incomplete Block Design (BIBD)

Chapter 5 Force and Motion

New problems in universal algebraic geometry illustrated by boolean equations

NOTE. Some New Bounds for Cover-Free Families

A STUDY OF HAMMING CODES AS ERROR CORRECTING CODES

DIMENSIONALITY LOSS IN MIMO COMMUNICATION SYSTEMS

arxiv: v2 [astro-ph] 16 May 2008

Identification of the Hardening Curve Using a Finite Element Simulation of the Bulge Test

STABILITY AND PARAMETER SENSITIVITY ANALYSES OF AN INDUCTION MOTOR

Conjugate Gradient Methods. Michael Bader. Summer term 2012

ON INDEPENDENT SETS IN PURELY ATOMIC PROBABILITY SPACES WITH GEOMETRIC DISTRIBUTION. 1. Introduction. 1 r r. r k for every set E A, E \ {0},

Control Chart Analysis of E k /M/1 Queueing Model

ME 3600 Control Systems Frequency Domain Analysis

Encapsulation theory: radial encapsulation. Edmund Kirwan *

Computers and Mathematics with Applications

Power and sample size calculations for longitudinal studies comparing rates of change with a time-varying exposure

Relating Branching Program Size and. Formula Size over the Full Binary Basis. FB Informatik, LS II, Univ. Dortmund, Dortmund, Germany

International Journal of Mathematical Archive-3(12), 2012, Available online through ISSN

Pulse Neutron Neutron (PNN) tool logging for porosity Some theoretical aspects

Implicit Constraint Enforcement for Rigid Body Dynamic Simulation

Diffusion and Transport. 10. Friction and the Langevin Equation. Langevin Equation. f d. f ext. f () t f () t. Then Newton s second law is ma f f f t.

Swissmetro: design methods for ironless linear transformer

Lecture 7 Topic 5: Multiple Comparisons (means separation)

PROBLEM SET #1 SOLUTIONS by Robert A. DiStasio Jr.

Using Laplace Transform to Evaluate Improper Integrals Chii-Huei Yu

Grouped data clustering using a fast mixture-model-based algorithm

I. Introduction to ecological populations, life tables, and population growth models

Duality between Statical and Kinematical Engineering Systems

Research Article On Alzer and Qiu s Conjecture for Complete Elliptic Integral and Inverse Hyperbolic Tangent Function

Transcription:

New Tends in Psychometics 253 An Oveview of Limited Infomation Goodness-of-Fit Testing in Multidimensional Contingency Tables Albeto Maydeu-Olivaes 1 and Hay Joe 2 (1) Faculty of Psychology, Univesity of Bacelona, P. Valle de Hebón 171, 08035 - Bacelona, Spain (2) Depatment of Statistics, Univesity of Bitish Columbia, Vancouve, BC, Canada V6T 1Z2 Abstact We povide an oveview of goodness-of-fit testing in categoical data analysis with applications to item esponse theoy modeling. A pomising line of eseach is the use of limited infomation statistics. These ae quadatic fom statistics in maginal esiduals such as univaiate and bivaiate esiduals. We descibe two appoaches to obtain asymptotic p-values fo these statistics: (1) matching the asymptotic moments of the statistic with those of a chi-squae distibution, (b) using g-inveses. Also, we discuss statistics fo piecewise assessment of model fit (i.e., fo items, pais of items, etc.). 1. Intoduction Until ecently, eseaches inteested in modeling multivaiate categoical data faced the poblem that most often no pocedue existed to assess goodness-offit of the fitted models that yielded tustwothy p-values except fo vey small models. Fotunately, this situation has ecently changed, and it is now possible to eliably assess the fit of multivaiate categoical data models. This beakthough is based on the pinciple that fo goodness-of-fit assessment one should not use all the data available. Rathe, by using only a handful of the infomation at hand (i.e., by using limited infomation) eseaches can obtain goodness-of-fit statistics that yield asymptotic p-values that ae accuate even in lage models and small samples. Futhemoe, the powe of such statistics can be lage than that of full infomation statistics (i.e., statistics that use all the data available). The pupose of this aticle is to povide an oveview of the new developments in limited infomation goodness-of-fit assessment of categoical data models; see also Batholomew and Tzamouani (1999), Cai et al. (2006), Mavidis et al. (2007), Maydeu-Olivaes and Joe (2005, 2006), and Reise (in pess). Although the exposition focuses on psychometic models (and in paticula on item esponse theoy models), the esults povided hee ae completely geneal and can be applied to any multidimensional categoical data model. 2. The Challenge of Testing Goodness-of-Fit in Multivaiate Categoical Data Analysis Conside modeling N independent and identically distibuted obsevations on n discete andom vaiables whose categoies have been labeled 0, 1,..., K 1. Fo notational ease we assume that all obseved vaiables consists of the same numbe of categoies K. This leads to a n-dimensional contingency table with C = K n cells. Howeve, the theoy applies also fo vaiables with diffeent numbe of categoies. We assume a paametic model fo π, the C-dimensional vecto of cell pobabilities, witing π(θ), whee θ is a q-dimensional paamete vecto to be estimated fom the data. The null and altenative hypotheses ae H 0 : π = π(θ) fo some θ vesus H 1 : π π(θ) fo any θ. pp. 253 262 c 2008 The Oganizing Committee of the Intenational Meeting of the Psychometics Society

254 The two most commonly used statistics fo testing the oveall fit of the model ae Peason s X 2 = 2N C c=1 (p c π c ) 2 /π c, and the likelihood atio statistic G 2 = 2N C c=1 p c ln(p c /π c ). When the model holds and maximum likelihood estimation is used, the two statistics ae asymptotically equivalent, G 2 = a X 2 d χ 2 C q 1. Howeve, in spase tables the empiical Type I eo ates of the X 2 and G 2 test statistics do not match thei expected ates unde thei asymptotic distibution. Of the two statistics, X 2 is less advesely affected by the spaseness of the contingency table than G 2. One eason fo the poo empiical pefomance of X 2 is that the empiical vaiance of X 2 and its vaiance unde its efeence asymptotic distibution diffe by a tem that depends on the invese of the cell pobabilities. When the cell pobabilities become small the discepancy between the empiical and asymptotic vaiances of X 2 can be lage. Thus, the accuacy of the type I eos will depend on the model being fitted to the table (as it detemines the cell pobabilities), but also on the size of the contingency table. This is because when the size of the contingency table is lage, the cell pobabilities must be small. Howeve, fo C and π(θ) fixed the accuacy of the the asymptotic p-values fo X 2 depends also on sample size, N. As N becomes smalle some of the cell popotions inceasingly become moe pooly estimated (thei estimates will be zeo) and the empiical Type I eos of X 2 will become inaccuate. The degee of spaseness N/C summaizes the elationship between sample size and model size. Thus, the accuacy of the asymptotic p-values fo X 2 depend on the model and the degee of spaseness of the contingency table. Thee altenative stategies have been poposed to obtain accuate p-values: (a) Pooling cells. If cells ae pooled befoe the model is fitted and if the estimation is based on the C pooled categoies and not the C oiginal categoies, then the appoximate null distibution of X 2 is χ 2 C 1 q, whee q denotes the numbe of paametes used afte pooling. Howeve, if estimation is based on the oiginal categoies, and pooling is based based on the esults of the analysis, then the esulting X 2 is stochastically lage than χ 2 C 1 q, and hence using a χ 2 C 1 q efeence distibution could give an unduly impession of poo fit, see Joe and Maydeu-Olivaes (2007) fo details. Most impotantly, thee is a limit in the amount of pooling that can be pefomed without distoting the pupose of the analysis. (b) Resampling methods. P-values fo goodness-of-fit statistics can be obtained by geneating the empiical sampling distibution of goodness-of-fit statistics using a esampling method such as the paametic bootstap method, see Langeheine et al. (1996), Batholomew and Tzamouani (1999), and Tollenaa and Mooijaat (2003). Howeve, thee is stong evidence that paametic bootstap pocedues do not yield accuate p-values, see Tollenaa and Mooijaat (2003) and Mavidis et al. (2007)). Futhemoe, esampling methods may be vey time consuming if the eseache is inteested in compaing the fit of seveal models. (c) Limited infomation methods. Only the infomation contained in suitable summay statistics of the data, typically the low ode maginals of the contingency table, is used to assess the model. This amounts to pooling cells a pioi, in a systematic way, so that the esulting statistics have a known asymptotic null distibution. These pocedues ae computationally much moe efficient than esampling methods.

255 3. An Oveview of Limited Infomation Methods Fo Goodness-of-Fit In this section, we conside methods fo testing the oveall fit of the model, followed by methods fo assessing the souce of any misfit. Befoe poceeding, notice that one obsevation of the ith vaiable Y i has a Multinomial(1; π i0,..., π i,k 1 ) distibution. Hence, the joint distibution of the andom vaiables is multivaiate multinomial (MVM). In the special case whee K = 2, Y i has a Benoulli distibution, and the joint distibution is multivaiate Benoulli (MVB). The MVM can be epesented by the C vecto of joint pobabilities π, o equivalently by the C 1 vecto of maginal pobabilities π. The elationship between both epesentations is one-to-one and can be witten as π = Tπ, whee T is a (K n 1) K n matix of 1s and 0s, of full ow ank. π can be patitioned as π = ( π 1, π 2,..., π n), whee π is the s i = ( n ) (K 1) vecto of th way maginal pobabilities, such that the maginal pobabilities involving categoy 0 ae excluded. Also, we wite π = ( π 1,..., π ) fo the s = i=1 s i vecto of multivaiate maginal pobabilities up to ode ( n). Now, let p and p be the vecto of cell popotions, and the vecto of maginal popotions up to ode, espectively. Also, let e = p π and e = p π be espectively the vecto of cell esiduals and maginal esiduals. Finally, we use ê and ê when these esidual vectos depend on the estimated paametes. To give a completely geneal esult, we only assume that ˆθ is a N -consistent and asymptotically nomal estimato. Specifically, we assume that ˆθ satisfies N (ˆθ θ) = H N (p π(θ)) + op (1) (1) fo some q C matix H. This includes minimum vaiance o best asymptotic nomal (BAN) estimatos such as the maximum likelihood estimato (MLE) o the minimum chi-squae estimato. It also includes the limited infomation estimatos fo IRT models: those implemented in pogams such as LISREL, EQS, MPLUS, o NOHARM, and those poposed by Chistoffesson (1975) and Jöeskog and Moustaki (2001). d We have the following esults fo the cell esiduals: N e N(0, Γ), and d N ê N(0, Σ), whee Γ = ππ, and Σ = (I H)Γ(I H). Hee, D = diag(π), and = π(θ)/ θ, which is assumed to be of full ank so that the model is identified when using full infomation. Fo BAN estimatos H = I 1 D 1, whee I = D 1 is the Fishe infomation matix. Fo the maginal esiduals up to ode, N e d N(0, Ξ ) and N ê d N(0, Σ ), whee Ξ = T ΓT, Σ = T ΣT = Ξ HΓT T ΓH + [HΓH ]. (2) In equation (2), HΓH is the asymptotic covaiance matix of N ˆθ, and = π (θ)/ θ is an s q matix, whee s denotes the dimension of the vecto of esiduals consideed. In the special case of BAN estimatos such as the MLE, we have Σ = Γ I 1, and Σ = Ξ I 1, espectively.

256 3.1. Testing the oveall fit of the model Two geneal stategies have been poposed to obtain goodness-of-fit statistics using limited infomation. Both ae based on quadatic foms in maginal esiduals. Suppose testing is to be pefomed using ê. We wite T = Nê Ŵê, (3) whee Ŵ conveges in pobability to an s s weight matix W. The fist stategy consists in choosing Ŵ so that the quadatic fom is easily computed. Two obvious choices ae (a) Ŵ = I, leading to U = Nê ê, and (b) Ŵ = (diag( Ξ )) 1 = (diag(ˆπ) ˆπ ˆπ ) 1, leading to D = Nê (diag(ˆπ) ˆπ ˆπ ) 1 ê. Quite geneally, the asymptotic distibution of T is a mixtue of independent chi-squae vaiates. P-values fo T ae then obtained by matching the moments of T with those of a cental chi-squae distibution. One, two, o thee moments can be matched. The fist thee asymptotic moments (mean, vaiance and thid cental moment) of T ae: µ 1 (T ) = t (WΣ ), µ 2 (T ) = 2t (WΣ ) 2, and µ 3 (T ) = 3t (WΣ ) 3. Let A ν be a andom vaiable with χ 2 ν distibution. To obtain a p- value using a two-moment adjustment, we assume that T can be appoximated by ba c. Solving fo the two unknown constants b and c using the fist two asymptotic moments of T yields b = µ 2 (T )/(2µ 1 (T )), c = µ 1 (T )/b. Fo the thee-moment adjustment, we assume that T can be appoximated by a + ba c. Solving fo the thee unknown constants a, b, and c using the fist thee asymptotic moments of T yields b = µ 3 (T )/(4µ 2 (T )), c = µ 2 (T )/(2b 2 ), and a = µ 1 (T ) bc. A p-value fo the two moment adjusted statistic is obtained using P(A c > T /b), and fo the thee moment adjusted statistic using P(A c > (T a)/b). Fo the one-moment appoximation, we assume again that T can be appoximated by ba t, whee t is the numbe of degees of feedom available fo testing. Heuistically, this can be taken to be t = s q. Solving fo b, we have b = µ 1 (T )/t, and the p-value fo the fist moment adjusted statistic is given by P(A t > T /b). Many diffeent limited infomation statistics can be constucted in this way depending on the choice of (a) maginal esidual in the quadatic fom, (b) weight matix, and (c) numbe of moments used to appoximate the cental chi-squae distibution. Regading (a), a typical choice is ê 2, the set of univaiate and bivaiate esiduals that do not include categoy 0. This is a vecto of dimension s = n(k 1) + ( n 2) (K 1) 2. Anothe choice is the set of all bivaiate esiduals ẽ 2 = p 2 π 2 (ˆθ), whee π 2 is a ( ) n 2 K 2 vecto with elements π (ij) k 1k 2 = P(Y i = k 1, Y j = k 2 ) and sample countepats p (ij) k 1k 2. An statistic based on ẽ 2 is X 2 = Nẽ ( )) 1 2 diag ( π2 ẽ2 = N i<j k 1 k 2 ( ) 2 p (ij) k 1k 2 π (ij) k 1k 2 (ˆθ) π (ij) k 1k 2, the sum of all ( n 2) X 2 bivaiate statistics. The fact that Σ, the asymptotic covaiance matix of the estimated maginal esiduals, needs to be estimated in this appoach esults in some dawbacks. Fist, the estimation of Σ can be computationally involved fo some estimatos, such as the MLE, when the model is lage. Anothe dawback is that a diffeent implementation is needed fo each estimato unde consideation, as Σ depends on H, which depends on the estimato chosen. Thus, (a) fomulae fo moment adjusted statistics fo testing IRT models fo binay data estimated using the MLE wee given by

257 Cai et al. (2006); see also Batholomew and Leung (2002), (b) fomulae fo testing IRT models estimated sequentially using tetachoic/polychoic coelations wee given by Maydeu-Olivaes (2001a) fo the binay case, and Maydeu-Olivaes (2006) fo the polytomous case, and (c) fomulae fo testing IRT models fo binay data estimated using the NOHARM pogam wee given by Maydeu-Olivaes (2001a). Maydeu-Olivaes (2001a, 2001b, 2006) consideed one- and two-moment appoximations to U 2. Cai et al. (2006) consideed one- to thee-moment appoximations to D 2, and also to the analogous statistic based only on bivaiate esiduals. In any case, available evidence on the use of this appoach suggests that (a) all in all this appoach gives accuate p-values, except when only one moment is matched, in which case the appoximation is geneally poo, (b) thee is little to choose fom U 2 and D 2, and that (c) statistics based on univaiate and bivaiate esiduals ae slightly moe poweful than statistics based only on bivaiate esiduals. The second stategy consists of choosing Ŵ so that the esulting quadatic fom is asymptotically chi-squae. This is the stategy followed by Reise (1996) and Maydeu-Olivaes and Joe (2005, 2006). Choosing Ŵ = Σ d ensues that T χ 2 t, whee t equals to the ank of Σ. A g-invese (o altenatively a Mooe-Penose invese Σ + ) needs to be employed because Σ is almost invaiably of deficient ank. Reise (1996) consideed a quadatic fom in ê 2 with Ŵ = Σ + 2 fo testing models fo binay data. The use of Σ + as a weight matix has two dawbacks. The fist dawback is that, as was the case with the moment-adjustment stategy, Σ needs to be estimated. The second dawback stems fom the fact that almost invaiably, the ank of Σ can not be detemined a pioi. In that case, one can detemine t, the degees of feedom, by inspecting the magnitude of the eigenvalues of Σ. Howeve, this may be ticky, as this matix often has some small eigenvalues, and t (and the value of the statistic itself) will depend on which eigenvalues ae judged to be zeo. To ovecome these difficulties, Maydeu-Olivaes and Joe (2005, 2006) consideed using instead a weight matix Ŵ such that Σ is a g-invese of W, that is, W = WΣ W. Moe specifically, they poposed using W = Ξ 1 Ξ 1 ( Ξ 1 ) 1 Ξ 1 = (c) ( (c) Ξ (c) ) 1 (c), (4) evaluated at ˆθ, as the weight matix in equation (3). Hee, (c) is the s (s q) othogonal complement of = T (i.e, it satisfies (c) = 0). One advantage of using this weight matix is that it does not equie an estimate of Σ, but of the moe easily computable Ξ. Anothe advantage is that by constuction, if the model is identified fom the maginal pobabilities up to ode, the degees of feedom t can be detemined a pioi: T d χ 2 s q. Yet, anothe advantage is that the esult holds fo any estimato (1), and hence, a single implementation suits all estimatos. Maydeu-Olivaes and Joe (2005, 2006) consideed the full class of statistics with (4) (efeed to as M statistics) and they showed that Peason s X 2 is a special case of the family when the MLE is used and all maginal esiduals ae used. 3.2. Assessing the souce of the misfit Limited infomation methods ae also useful to identify the souce of misfit in pooly fitting models. The inspection of standadized cell esiduals is often not vey useful to this aim. It is difficult to find tends in inspecting these esiduals, and even fo modeate n the numbe of esiduals to be inspected is too lage.

258 Pehaps most impotantly, Batholomew and Tzamouani (1999) point out that because the cell fequencies ae integes and the expected fequencies in lage tables must be vey small, the esulting standadized esiduals will be eithe vey small o vey lage. Yet, dividing a maginal esidual by its asymptotic standad eo we obtain a standadized maginal esidual that is asymptotically standad nomal. To identify the souce of the misfit, these esiduals (univaiate, bivaiate, o tivaiate) can be inspected. Howeve, when the obseved vaiables ae not binay, the numbe of maginal esiduals gows vey apidly as the numbe of categoies and vaiables inceases, and it may be difficult to daw useful infomation by inspecting individual maginal esiduals. Fo polytomous data models, a moe fuitful avenue is to assess how well the model fits single vaiables, vaiable pais, etc. (i.e., subtables). Note that this is like multiple testing afte a jointly significant esult. If the model fo an -vaiate subtable is identified (with t > 0 degees of feedom), Maydeu-Olivaes and Joe s M statistic can be used to assess the fit to d the subtable, whee M χ 2 t. In contast, when applied to an identified subtable, the asymptotic distibution of X 2 is stochastically lage than χ 2 t, because the paametes in the subtable have been estimated using the full table, see Maydeu- Olivaes and Joe (2006). 4. Numeical Examples To illustate the discussion we conside two numeical examples. The fist one is the well-known LSAT 7 dataset, see Bock and Liebeman (1970). It consists of 1000 obsevations on five binay vaiables; thus, C = 2 5 = 32. A two paamete logistic IRT model is fitted to this data. The second dataset consists of 551 young women esponding to the five items of the Positive Poblem Oientation (PPO) scale of the Social Poblem Solving Inventoy-Revised, see D Zuilla et al. (2002). These Liket-type items consist of five categoies. Fo this analysis the two lowest and the two highest categoies wee meged; thus C = 3 5 = 243. Samejima s (1969) gaded model is fitted to these data. Maximum likelihood estimation was used in both examples. 4.1. LSAT 7 data Table 1 povides the esults obtained with X 2 and G 2. Because the data ae not spase, both statistics yield simila esults. The model can not be ejected at the 5% significance level. We have also included in this table the esults obtained with thee limited infomation test statistics: M 2, M 3, and D 2. Univaiate and bivaiate esiduals ae used in M 2 and D 2. Up to tivaiate esiduals ae used in M 3. Also, fom Maydeu-Olivaes and Joe (2005), X 2 = M 5 because ML estimation was used. One-, two-, and thee-moment adjustments whee used to obtain p-values fo D 2. They ae labeled D (1) 2, D(2) 2, and D(3) 2 in Table 1. We see in this table that, fo this example, the same appoximate p-values fo D 2 ae obtained egadless of the numbe of moments used. Even a one-moment adjustment gives good esults. We also see in Table 1 that when data ae not spase, limited infomation statistics yield simila p-values than full infomation statistics. Howeve, they do so at the expense of fewe degees of feedom. It is inteesting to compae the statistic/df atios fo the membes of the M family of statistics. These atios ae 2.39, 1.77 and 1.55 fo M 2, M 3 and M 5, espectively. Joe and Maydeu-Olivaes (2007) have theoy that elate lage atios with smalle degees of feedom to test statistics that have moe powe fo easonable diectional altenatives. We now conside using R 2 = ê 2 Σ + ê2, as in Reise (1996). Thee ae 15 univaiate and bivaiate esiduals in ê 2. Table 2 povides the value of the 6 smallest

259 Table 1 Goodness-of-Fit Results fo the LSAT7 Data. stat value df p-value stat value df p-value X 2 32.48 21 0.052 D (1) 2 11.33 5 0.045 G 2 31.70 21 0.063 D (2) 2 11.36 5.0 0.045 M 2 11.94 5 0.036 D (3) 2 10.90 4.7 0.045 M 3 26.48 15 0.033 eigenvalues of Σ, the value of R 2 fo j = 1,..., 6, if the jth eigenvalue and those smalle ae judged to be zeo, and the esulting df and p-values. The esults illustate how the p-value is affected by how many eigenvalues ae judged to be zeo. Also, notice that a lage ange of p-values ae obtained than when moment coections fo D 2 ae used. Nevetheless, simulation esults by Mavidis et al. (2007) eveal this statistic also woks adequately. Also, detemining the degees of feedom in Reise s appoach is moe numeically stable when fitting models that do not equie numeical integation to obtain pobabilities (such as loglinea models). Table 2 Range of P-values Obtainable fo R 2 eigenvalue R 2 df p-value 2.22 10 5 17.82 9 0.037 3.83 10 6 18.29 10 0.050 3.95 10 9 19.42 11 0.069 2.90 10 11 19.78 12 0.079 1.03 10 13 19.78 13 0.101 1.02 10 15 19.78 14 0.137 Finally, conside obtaining a bette fitting model by dopping one item. The standadized cell esiduals ae not vey helpful to this end. Thee ae only two standadized cell esiduals significant at a 5% level, those fo pattens (0, 1, 0, 0, 0) and (1, 0, 0, 0, 0). The inspection of univaiate, bivaiate and tivaiate esiduals is moe helpful. The significant standadized esiduals fo up to tivaiate magins ae fo magins (1, 3), (1, 4), (1.5), and (2, 3). They indicate that item 1 is the best selection if an item is to be dopped to fit the model. This is indeed the case, as shown in Table 3. Table 3 X 2 Obtained When Dopping One Item at a Time (df = 7) item dopped 1 2 3 4 5 X 2 5.01 9.52 8.59 18.68 9.86 p-value 0.66 0.22 0.28 0.01 0.20 4.2. PPO data With polytomous data, the contingency table often becomes spase and X 2 and G 2 sometimes yield conflicting esults, indicating that both p-values ae incoect. In those cases, G 2 gives an ovely optimistic p-value (often 1), and X 2 geneally

260 gives a p-value of 0. Because in this example the data ae not spase, the discepancy between both statistics is not lage, as shown in Table 4. The table also includes the esults obtained with M 2, and the esults of using D 2 and X 2 (using 1- to 3-moment adjustments to obtain thei appoximate p-values). In this case, the p-values fo D 2 appea inflated, but those fo X 2 appea easonable. Table 4 Goodness-of-Fit Results fo the PPO Data. stat value df p-value X 2 304.89 227 0.0004 G 2 271.09 227 0.024 M 2 55.01 35 0.017 D (1) 2 42.58 35 0.177 D (2) 2 22.21 18.3 0.236 D (3) 2 14.62 11.5 0.230 X (1) 2 52.13 35 0.031 X (2) 2 45.19 30.3 0.041 X (3) 2 34.11 21.6 0.042 We also applied R 2 to these data. Fo this example, Σ2 is of full ank (its smallest eigenvalue is 8.42 10 5 ), yielding a statistic of 66.65 on 50 df, p = 0.058. Thus, fo this example, based on the same esiduals, this statistic has 15 moe df than M 2 and yields a slightly lage p-value. Table 5 Bivaiate X 2 Statistics (Above the Diagonal) and M 2 Statistics (Below the Diagonal) 1 2 3 4 5 1 5.92 4.89 6.97 3.99 2 0.69 4.99 7.47 4.53 3 3.36 2.94 0.41 7.89 4 2.70 2.29 0.15 2.97 5 1.90 3.45 4.34 1.57 We next conside obtaining a bette fitting model by dopping one item. To do so, we shall assess how well the model fits diffeent subtables. The degees of feedom fo testing the fitted model one item at a time is negative. Thus, item level testing is not possible fo this model. Thee ae 2 df fo testing the model fo pais of items, and the model fo the subtable is identified. Table 5 povides the M 2 statistics fo each pai of vaiables, and fo compaison the X 2 statistics. Staed statistics ae significant at α = 0.05, based on a χ 2 2. Notice that the values of the X 2 statistics ae lage than the values of the M 2 statistics. Also, they yield a misleading impession of poo fit because the asymptotic distibution of X 2 is stochastically lage than χ 2 2. The inspection of the values of X 2 suggests the model misfits at the bivaiate level, and it suggests that fit would impove the most by dopping item 4. In contast, the inspection of the M 2 values does not eveal any model misfit at the bivaiate level. These statistics ae not useful to locate the souce of the misfit. In this case, we can inspect the value of M 3 fo tiplets of items. With 17 degees of feedom, thee ae two M 3 statistics significant at α = 0.05; fo tiplets (2, 3, 5) and (2, 4, 5), which

261 suggests that eithe item 2 o 5 should be dopped. Table 6 eveals that item 2 should to be dopped to get a bette fit. Table 6 X 2 Obtained When Dopping One Item at a Time (df = 68) item dopped 1 2 3 4 5 X 2 82.92 66.61 86.97 106.42 89.17 p-value 0.105 0.525 0.060 0.002 0.044 5. Discussion Limited infomation statistics appea to be a pomising avenue to ovecome the decades long poblem of assessing the goodness-of-fit in multivaiate categoical data analysis. Using these statistics, it is possible to obtain p-values fo extemely lage models. Extant esults suggest thei asymptotic p-values yield accuate esults even in samples of 300 obsevations. Howeve, so fa the behavio of these statistics has only been investigated fo a handful of models containing up to 20 vaiables. Moe eseach is needed to investigate the behavio of the statistics in the extemely lage models that ae common in Social Science applications, and fo altenative models. A citical limitation of these methods is that the model must be identified fom the statistics used fo testing. This needs to be veified numeically application by application and fo the statistics that assess the oveall fit and also the statistics used fo subtables. It may be that the model is identified fom the statistic of inteest but that in a given application it is nealy non-identified. In that case, the statistic will become numeically unstable and yield uneliable esults. Hee, we have focused on limited infomation statistics based on low ode maginal esiduals. A limitation of these statistics is that if testing is pefomed using up to say -vaiate infomation, then they have no powe to detect model misfit if it is only pesent in + 1 and highe associations. In ou view, this is not a seious limitation. A eseache inteested in detecting a specific model misfit can constuct a limited infomation statistic (not necessaily based on esidual moments) so that it achieves good powe with espect to the specific altenatives of inteest (Joe and Maydeu-Olivaes, 2007), and the esulting statistic is moe poweful than full infomation statistics. Acknowledgments This eseach has been suppoted by gant SEJ2006-08204/PSIC of the Spanish Ministy of Science and Technology, and an NSERC Canada gant. Refeences Batholomew, D. J. & Leung, S. O. (2002). A goodness of fit test fo spase 2 p contingency tables. Bitish Jounal of Mathematical and Statistical Psychology, 55, 1 15. Batholomew, D. J. & Tzamouani, P. (1999). The goodness-of-fit of latent tait models in attitude measuement. Sociolological Methods and Reseach, 27, 525 546. Bock, R. D. & Liebeman, M. (1970). Fitting a esponse model fo n dichotomously scoed items. Psychometika, 35, 179 197. Cai, L., Maydeu-Olivaes, A., Coffman, D. L., & Thissen, D. (2006). Limited infomation goodness of fit testing of item esponse theoy models fo spase 2 p tables. Bitish Jounal of athematical and Statistical Psychology,59, 173 194. Chistoffesson, A. (1975). Facto analysis of dichotomized vaiables. Psychometika, 40, 5 32.

262 D Zuilla, T. J., Nezu, A. M. & Maydeu-Olivaes, A. (2002). Manual of the Social Poblem-Solving Inventoy-Revised. Noth Tonawanda, NY: Multi-Health Systems, Inc. Joe, H. & Maydeu-Olivaes (2007). Constucting chi-squae goodness-of-fit tests fo multinomial data that ae moe poweful than Peason s X 2. Unde eview. Jöeskog, K. G. & Moustaki, I. (2001). Facto analysis of odinal vaiables: A compaison of thee appoaches. Multivaiate Behavioal Reseach, 36, 347 387. Langeheine, R., Pannekoek, J., & van de Pol, F., (1996). Bootstapping goodness-of-fit measues in categoical data analysis. Sociological Methods and Reseach, 24, 492 516. Mavidis, D., Moustaki, I., & Knott, M. (2007). Goodness-of-fit measues fo latent vaiable models fo binay data. In S.-Y. Lee (Ed.). Handbook of Latent Vaiable and Related Models. (pp. 135 162. Maydeu-Olivaes, A. (2001a). Limited infomation estimation and testing of Thustonian models fo paied compaison data unde multiple judgment sampling. Psychometika, 66, 209 228. Maydeu-Olivaes, A. (2001b). Multidimensional item esponse theoy modeling of binay data: Lage sample popeties of NOHARM estimates. Jounal of Educational and Behavioal Statistics, 26, 49 69. Maydeu-Olivaes, A. (2006). Limited infomation estimation and testing of discetized multivaiate nomal stuctual models. Psychometika, 71, 57 77. Maydeu-Olivaes, A. & Joe, H. (2005). Limited and full infomation estimation and goodness-of-fit testing in 2 n contingency tables: A unified famewok. Jounal of the Ameican Statistical Association, 100, 1009 1020. Maydeu-Olivaes, A. & Joe, H. (2006). Limited infomation goodness-of-fit testing in multidimensional contingency tables. Psychometika, 71, 713 732. Reise, M. (1996). Analysis of esiduals fo the multinomial item esponse model. Psychometika, 61, 509 528. Reise, M. (in pess). Goodness-of-fit testing using components based on maginal fequencies of multinomial data. Bitish Jounal of Mathematical and Statistical Psychology. Samejima, F. (1969). Calibation of latent ability using a esponse patten of gaded scoes. Psychometika Monogaph Supplement, No. 17. Tollenaa, N. & Mooijaat, A. (2003). Type I eos and powe of the paametic bootstap goodness-of-fit test: Full and limited infomation. Bitish Jounal of Mathematical and Statistical Psychology, 56, 271 288.