Applied Mathematics Research Report 07-08

Similar documents
Markovian Combination of Decomposable Model Structures: MCMoSt

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

2.1.3 The Testing Problem and Neave s Step Method

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables

Goodness of Fit Test and Test of Independence by Entropy

Pseudo-score confidence intervals for parameters in discrete statistical models

Summary of Chapters 7-9

HANDBOOK OF APPLICABLE MATHEMATICS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Sample Categorical Data

On Selecting Tests for Equality of Two Normal Mean Vectors

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Asymptotic Statistics-VI. Changliang Zou

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Causal Models with Hidden Variables

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

A homogeneity test for spatial point patterns

A bias-correction for Cramér s V and Tschuprow s T

STA 4273H: Statistical Machine Learning

Statistical Data Analysis Stat 3: p-values, parameter estimation

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

BTRY 4090: Spring 2009 Theory of Statistics

APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV

Journal of Forensic & Investigative Accounting Volume 10: Issue 2, Special Issue 2018

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions

11-2 Multinomial Experiment

GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Frequency Distribution Cross-Tabulation

A simple graphical method to explore tail-dependence in stock-return pairs

Recall the Basics of Hypothesis Testing

Unit 9: Inferences for Proportions and Count Data

Multiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale

Hypothesis testing:power, test statistic CMS:

A sequential hypothesis test based on a generalized Azuma inequality 1


Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Chapter 1 Statistical Inference

Data Mining 2018 Bayesian Networks (1)

Analysis of data in square contingency tables

Reports of the Institute of Biostatistics

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

Thomas J. Fisher. Research Statement. Preliminary Results

Discrete Multivariate Statistics

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

TUTORIAL 8 SOLUTIONS #

Unit 9: Inferences for Proportions and Count Data

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

PAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator FOR THE BINOMIAL DISTRIBUTION

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Confidence Intervals, Testing and ANOVA Summary

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

subroutine is demonstrated and its accuracy examined, and the nal Section is for concluding discussions The rst of two distribution functions given in

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

Unit 14: Nonparametric Statistical Methods

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

Chi-Square. Heibatollah Baghi, and Mastee Badii

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Bootstrap Procedures for Testing Homogeneity Hypotheses

Test Volume 11, Number 1. June 2002

Ling 289 Contingency Table Statistics

Psychology 282 Lecture #4 Outline Inferences in SLR

Statistics 3858 : Contingency Tables

Estimation of parametric functions in Downton s bivariate exponential distribution

Structure learning in human causal induction

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Stat 5101 Lecture Notes

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

How likely is Simpson s paradox in path models?

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Subject CS1 Actuarial Statistics 1 Core Principles

Research on Standard Errors of Equating Differences

Generalized Cramér von Mises goodness-of-fit tests for multivariate distributions

Quadratic forms in skew normal variates

Transcription:

Estimate-based Goodness-of-Fit Test for Large Sparse Multinomial Distributions by Sung-Ho Kim, Heymi Choi, and Sangjin Lee Applied Mathematics Research Report 0-0 November, 00 DEPARTMENT OF MATHEMATICAL SCIENCES

Estimate-based Goodness-of-Fit Test for Large Sparse Multinomial Distributions Sung-Ho Kim a,, Hyemi Choi b, Sangjin Lee a a Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, 0-0, South Korea b Department of Statistical Informatics, Chonbuk National University, Chonju, -, South Korea Abstract The Pearson s chi-squared statistic (X ) does not in general follow a chi-square distribution when it is used for goodness-of-fit testing for a multinomial distribution based on sparse contingency table data. We explore properties of Zelterman s () D statistic and compare them with those of X and we also compare these two statistics and the statistic (L r ) which is proposed by Maydeu-Olivares and Joe (00) in the context of power of the goodness-of-fit testing when the given contingency table is very sparse. We show that the variance of D is not larger than the variance of X under null hypotheses where all the cell probabilities are positive, that the distribution of D becomes more skewed as the multinomial distribution becomes more asymmetric and sparse, and that, as for the L r statistic, the power of the goodness-of-fit testing depends on the models which are selected for the testing. A simulation experiment strongly recommends to use both D and L r for goodness-of-fit testing with large sparse contingency table data. Key words: Test power, P-value, Sample-cell ratio, Model discrepancy, Normal approximation, Skewness Introduction A variety of statistics have been derived for goodness-of-fit testing such as the Pearson chi-squared statistic (X ) (Pearson, 00), the log likelihood ratio statistic (G ), the Freeman-Tukey statistic, the Neyman modified X, and the modified G. Corresponding author. Email addresses: shkim@amath.kaist.ac.kr (Sung-Ho Kim), hchoi@chonbuk.ac.kr ( Hyemi Choi), Lee.Sang.jin@kaist.ac.kr ( Sangjin Lee). Preprint submitted to Elsevier Preprint November 00

Cressie and Read () proposed a family of power divergence statistics, in which all of the above statistics are embedded. All of these statistics are asymptotically chi-squared distributed with appropriate degrees of freedom (e.g. Fienberg, ; Horn, ; Lancaster, ; Moore, ; Watson, ). Two most popular statistics among them are X and G. Cochran () gives a nice review of the early development of X and concludes that there is little to distinguish it from G. The chi-squared approximation to X and G was studied by Read (), and the accuracy of these approximations was examined by Koehler and Larntz (0) and Koehler (). Koehler () demonstrated asymptotic normality of G for testing the fit of log-linear models with closed form maximum likelihood estimates in sparse contingency tables and showed that the normal approximation can be much more accurate than the chi-squared approximation for G, but he admitted that the bias of the estimated moments is a potential problem for very sparse tables. A good review on goodness-of-fit tests for sparse contingency tables is provided in Koehler (). Zelterman () proposed a statistic D and compared it with X for testing goodness-of-fit (GOF) to sparse multinomial X distributions, where (ni /e i ), D = X where n i is the count in the ith cell, e i the estimate of the cell mean of the ith cell, andp is the summation over all the cells such that e i > 0. The D statistic is not a member of the family of Cressie and Read s power divergence statistics. The goal of this paper is to explore properties of the goodness-of-fit test using D (or the D test for short), in comparison with the goodness-of-fit test using X (or the X test for short), other than the asymptotic properties as derived in Zelterman (), to do a more rigorous comparative study on the goodness-of-fit tests using the statistics, D, X, and L r, in the context of power, and to propose an approach to goodness-of-fit testing for a very sparse multinomial distribution. This paper consists of six sections. Section describes what happens to X when the data size n is smaller than the number of cells (k) of a given contingency table and summarizes earlier works on asymptotic properties of X with more attention paid to the case that n < k. Section compares D and X in the context of variance, and it is shown that the variance of D is not larger than that of X under any null hypothesis of multinomial distribution. We carry out a Monte Carlo study in Section to compare the goodness-of-fit tests by using D, X, and L r in the context of test power. The Monte Carlo result strongly suggests that the D test is at least as good as the X test when n < k and comparable with the L r test. We then investigated normal approximation to the standardized D curve in Section. Section concludes the paper with some further remarks on the goodness-of-fit testing for sparse contingency tables.

A BRIEF REVIEW ON GOODNESS-OF-FIT TESTS USING X AND SOME RELATED STATISTICS Suppose that we have a contingency table with k cells and n observations, and consider a goodness-of-fit testing of a symmetric null hypothesis, i.e., all the cell probabilities are the same. We will, first of all, consider the two extreme cases that n = and n =. Under the symmetric null hypothesis, X = k when n = ; and when n =, it is either k or k/ according to whether a particular cell frequency equals or not. It is interesting to note that when n < k, X is strictly positive under any hypothesis for the contingency table. This simply shows that the asymptotic distribution of X can not be a chi-square distribution when n < k. This is one of the reasons that the normal approximation was sought for in the literature when n < k. While the literature on the asymptotic properties of X abounds in regard to contingency tables with large enough data (e.g., Agresti (0); Bishop et al. ()), a unifying result is yet to be explored as for the goodness-of-fit testing for sparse contingency tables. Results from the literature concerning Pearson s statistics are summarily listed below in regard to sparse multinomial problems. (i) The X test has some optimal local power properties in the symmetrical case when the number of cells is moderately large. Hence the X test based on the traditional chi-squared approximation is preferred for the test of symmetry. For the null hypothesis of symmetry, the chi-squared approximation for X is quite adequate at the and nominal levels for expected frequencies as low as 0. when k, n 0, n /k 0 (Koehler and Larntz, 0, p. ) (ii) The chi-squared approximation for X produces inflated rejection levels for asymmetrical null hypotheses that contain many expected frequencies smaller than (Koehler and Larntz, 0, p. ). (iii) The adverse effects of small expected frequencies on the chi-squared approximation are generally less severe for X than for other commonly used goodness-of-fit statistics (Koehler,, p. ). (iv) The asymptotic power of the D test is moderate and at least as great as that of the normalized X (Zelterman,, p. ). (v) For the symmetric null hypothesis, choosing the value of the family parameter λ of Cressie and Read s () power divergence family of statistics in [, ] results in a test statistic whose size can be approximated accurately by a chisquared distribution, provided that k and n 0. When λ [, ] \ [, ], the corrected chi-square approximation F C is recommended where F C is the corrected chi-square distribution for which the mean and variance agree to second order with the mean and variance of Cressie and Read s power divergence statistics, since it gives reasonably accurate approximate levels and is easy to compute. When both n and k are large (say, n > 00), then the

normal approximation should be used (Read,, p. ). (vi) Berry and Mielke () proposed non-asymptotic chi-square tests based on X and D that use Pearson Type III distribution approximation for testing large sparse contingency tables. (vii) Maydeu-Olivares and Joe (00) proposed a statistic, L r, for goodness-of-fit testing using J contingency table data which is given in a quadratic form based on multivariate moments up to order r and showed that X belongs to this family of L r. They showed that when r is small (e.g., or ), the L r has better small-sample properties and are asymptotically more powerful than X for some multivariate binary models. COMPARISON OF VARIANCES The favorable comparison (Zelterman (, p. )) of D over the normalized version of X in the context of asymptotic power in goodness-of-fit tests may be extended to the situations of small or moderate data sizes. To see this, we will have a closer look at the two statistics and compare their exact means and variances across multinomial distributions, which will give us an insight into the power comparison. Suppose that the cell probability at the ith cell, p i, is equal to p 0i under the null hypothesis (H 0 ) and let it be equal to p ai under an alternative hypothesis (H a ). We will denote the cell means in the same manner, i.e., by under H 0 and by m ai under H a. Throughout the rest of the paper, we will assume thatqi p 0i > 0. We can then re-express D and X as kx X X = D = X i= (n i ) /, i n i /. We list a couple of simple facts: kx E(X X ) = E(D ) = E(X ) i= np i q i /. () i p i /p 0i. () The difference between the variance of D and that of X is given in

Theorem Under a multinomial assumption with the sample size n and a contingency table of k cells, we have n kx kx V (D ) = V (X ) + i= j= +n(n ) kx kx kx = V (X ) + i= i= j= p i p j m 0j p i p j m 0j p i + (n )p i kx Ž kx Ž m 0i kx i= p i i= m 0i pj j= p i Œ /k. p 0j p 0i Proof: See Appendix A. A couple of corollaries follow from this theorem. Corollary Under H 0, V (D ) is simplified as V (D ) = V (X ) + k n kx V (X ), i= where the equality holds if and only if H 0 is symmetrical. Corollary Suppose that p ah =, for some h k under an H a. Then under the H a, we have V (D ) = V (X ). () Let δ = E(D ) E(X ) and δ = V (D ) V (X ). Corollary implies that δ < 0 in a near neighborhood of (p 0,..., p 0k ) when H 0 is not symmetrical. And judging from Corollary, we can see that δ is bounded on the boundary of the k dimensional simplex of (p,..., p k ) since δ is continuous in (p,..., p k ). SIMULATION EXPERIMENTS To compare the X and D tests, we conducted simulation experiments with small samples, using a model of 0 binary variables whose model structure is represented as a directed acyclic graph (DAG). We will call this model Model 0 and denote it by M 0. Ten alternative models are considered for the goodness of fit test, which are determined by adding or deleting some edges from Model 0. They are displayed

in Figure along with the true model, M 0, where each vertex represents a binary variable. A brief explanation of the DAG models is given in Appendix B. For each model, sample sizes n = 0, 00, 00 are selected such that n/k is close to /0, /, /. For each of the ten alternative models, say M, we obtain an approximate value of the p-value of the goodness-of-fit test as follows: (i) Generate a sample {(n (0),..., n (0) k ) : Pk i= n (0) i = n} of size n from M 0. (ii) Estimate the cell probabilities for model M based on the sample values, n (0),, n (0) k. (iii) Generate a sample of size n from model M based on the estimates of the cell probabilities that are obtained in (ii) and compute X and D using the sample values and the estimates obtained in (ii). (iv) Repeat (iii) until t X and D values are obtained, and arrange the t X values in an increasing sequence of x (),, x (t) and similarly the D values into the sequence of d (),, d (t). (v) Compute the values of X and D using the sample values, n (0),, n (0) k, and the estimates obtained in (ii), and then obtain the values of p x (X ) and p d (X ) which are defined below in expression (). 0 Model 0 Model 0 Model 0 0 Model Model 0 Model 0 0 Model Model 0 Model 0 Model 0 Model 0 0 Fig.. Graphs of DAG Models, Models 0(M 0 ), (M ) through 0(M 0 ). Models M i, i =,,, 0, are constructed by removing (dashed line) or adding (dot-dashed line) arrows to M 0.

><>: t+ if X x (t) We define a pseudo p-value of X, p x (X ), as p x (X ) = i if x t+ (i) X < x (i+), i < t. () 0. t+ if X < x () A pseudo p-value of D, p d (D ), is defined similarly as above except that X and x are replaced with D and d, respectively. As is apparent in (), the pseudo p-value gets close to the actual p-value as t increases, where the actual p-value is for the goodness-of-fit test of model M. If X x (), we have 0 p x (the actual p-value) < t +, and p x (the actual p-value) < 0. t + otherwise. The same inequalities hold for p d also. To compare the power of goodness-of-fit test between D and X, we repeated the above procedure, (i) through (v), 00 times to obtain as many values of each of p x and p d with t = 00. Table lists the proportions that p x (p d ) values are less than {, 0.} for three sample sizes(n) 0, 00, and 00. In the table, we can see that the D test is more powerful than the X test. Model M 0 is of 0 binary variables, which means a contingency table of 0 =, 0 cells. So the sample-cell ratios (n/k) considered in the table are 0/0, 00/0 = 0., and 00/0 =. The power increases as the sample size increases for both of the D and the X tests. However, it is interesting to see that the power does not change noticeably for some models with the X test. As for models M and M, the proportions that p x is not larger than 0. while those for p d increase up to and 0., respectively. We can see the same phenomenon for the two models when = 0.. We can see in the figures in Appendix C that p x is less sensitive to the sample size than p d. We will denote by π x () the proportion that p x out of the 00 p x values and similarly for π d (). From the π values of models M, M, and M, we can see that the π values do not change monotonically in the number of deleted edges from model M 0. It is indicated in the table that the π value is sensitive to how the edge deletion affects the inter-relationship among the variables in the model. For example, in model M, variables X, X, X, and X are independent of the other variables in the model, while no variable is isolated from the rest of the variables in models M and M. In Table, we can also see that the X is less sensitive, than the D test, to differences in model structure. For example, model M is smaller than model M by three

edges and this difference in model structure is reflected in π d (), ( =, 0.), while π x () values remain more or less the same. One may expect that any addition of edges between the set of variables, X, X, X, X and the set of the remaining variables in model M might decrease the π value. But this is not the case as the π values indicate for models M and M. We see almost no difference in the π values between the two models. It is interesting to note that connecting the two sets of variables by adding inappropriate edges which are not found in the true model M 0 does not affect the π value. The same phenomenon occurs for models M and M. The simulation result strongly recommends the D test in comparison with the X test in the sense that the D test is more sensitive to the sample size and the difference of the model structure between the true model and a selected model. Table The proportions of the p x and p d values that are less than or equal to {, 0.} out of 00 replications for the p x and p d values. The values of the proportions are rounded to the two decimals. = Models D X Add Delete Model n = 0 n = 00 n = 00 n = 0 n = 00 n = 00 0 0 0 0 0 0 0 0-0 0. 0-0 0 0-0. 0. 0 0-0 0. 0. - 0 0. 0. - 0. 0. 0. 0. - 0. 0. 0. 0. 0-0. 0..00 0. 0-0..00 0. 0. 0-0.00.00 0. 0. = 0. Model n = 0 n = 00 n = 00 n = 0 n = 00 n = 00 0 0 0 0 0 0. 0. 0.0 0. 0. 0. 0. 0. 0. 0. 0.0 0. 0. 0. 0..00.00 0 0..00 0. 0. 0. 0..00 0. 0. 0.00.00 0. 0.

However, when the sample size is too small (e.g., n = 0) relative to the cell size (e.g., k =, 0), the D test may not be very useful unless our interested models are very different from the true model M 0 as the models M, M, and M 0. In such a small-sample situation, it is desirable that we use both of the D and the L r tests. A comparison between these tests follows below. Maydeu-Olivares and Joe (00) suggest using L r for r =,, for large and sparse J tables. From an additional simulation experiment, where we used D and L r for testing the goodness-of-fit of the set of the models in Figure, we could see that the D test is more powerful than the L r test for some of the models in the figure and vice versa for the others in the figure. When the sample size was as small as, the power of the L r test became unstable while it was not the case as for the D test. For instance, the proportion that the p-value of the goodness-of-fit test using L is less than was,, and for the models, M, M, M 0, respectively, out of 00 iterations, while they are 0.0,, and 0. with the D test. Considering the structural discrepancies of these three models from M 0 (see the first three columns in Table ), the proportions from the D test reflect the structural discrepancies better than the L test when n =. The proportions from the L test were 0 or close to zero for all the (M i, n) combinations with i = 0,,, 0, n =, 0, 00, 00 except the combinations, (M, 00) and (M, 00), for which the proportions were and 0. respectively. THE DISTRIBUTION OF THE STANDARDIZED D Although the D test is preferred, when the sample-cell ratio (n/k) is less than, to the X test in the goodness-of-fit testing due to its sensitivity to the sample size and the discrepancy between model structures, it is important to investigate the dependency of its distribution upon the sample-cell ratio and the level of asymmetry (or non-uniformity) of the cell probabilities, p,, p k, which is defined by λ = kx i= (p i k ). The first three moments of X and D are obtained under H 0 in Horn () and Mielke and Berry (), respectively. The mean and the variance of D are given under H 0, respectively, by E 0 (D ) = and V 0 (D ) = (k )( n ),

which have nothing to do with the distribution under H 0, while kx V 0 (X ) = (k )( n ) k n + i= is not (see the equation in Corollary.) But the skewnesses of X and D under H 0 are dependent upon the distribution under H 0 (e.g. Mielke and Berry,, p. ). According to Mielke and Berry (), the skewness of D is given by γ =s n n k +s n n {( n ) ( n )k }( k ) (k ) kx i= p i /n. () Denote the two terms in the right hand side of () by A and B, respectively. Then =Ê n O A + k k + + k kn O O + () k n and, as for B, we will consider asymmetric distributions of p i s as given by, for 0 < ɛ < and g < k, Then we have p = = p g = ɛ k and p g+ = = p k = kx i= k ɛg k(k g). () p i = g k ɛ + (k g) ɛg/k. () For the cell probabilities in (), the level of asymmetry is given by λ = g( ɛ) k(k g). () From (), (), and (), we can see that as g, g k, increases and ɛ, 0 < ɛ <, becomes smaller, both γ and λ increases. As a matter of fact, γ increases as λ increases. In particular, cê when s(s > ) is chosen so that g = k/s is an integer and o k and n satisfy k = cn for some real c, we have B = k + + c + ɛ(s ) +. (0) k ɛ(s ɛ) k Note that when ɛ =, we have p = = p k = /k. Under the distribution as 0

given by (), we have from () and (0) that γ =Ê k + c k + c + + c k «o + ɛ(s ) +. () ɛ(s ɛ) k The last expression implies that as we have more small p i s (that is, a smaller s (s > ) with a small ɛ value), the skewness (γ) of D gets larger. In other words, as there are more small p i s, the normal approximation to D gets worse at the tail areas. This phenomenon is observed in Tables and. It is also noteworthy in expression () that as the contingency table becomes sparser (i.e., c = k/n gets larger), the distribution of D gets more skewed. Table is obtained under symmetric multinomial distributions for k = 0,, with varying sample sizes (n) as given in the table. Under each symmetric distribution, 0,000 D values were generated and percentiles were obtained from them, and after repeating this 00 times, the means and the variances of the percentiles were obtained. Table is obtained from the same Monte Carlo experiment as for Table except that a variety of multinomial distributions were used, where each distribution is defined by the vector of the cell probabilities (p,, p k ). Here, each p i was obtained as a ratio given by u i /Pk j= u j where the random quantities, u j s, were confined to lie uniformly between and 0. to avoid too small cell probabilities. Table Three upper tail percentiles of the standardized D under symmetric multinomials. d is the upper 00 percentile of the standardized D distribution. 00 d values were generated for each pair of k and n and their mean, standard deviation (sd), and skewness measure were computed. n/k = refers to the standard normal distribution. γ is the skewness value as given by (). mean n n/k γ = 0. = = = 0. = = k = 0. 0...0 00 00 0. (0) 0.... 00 0. 00... 00 00 0...0. 0 0....... k =..00.00.00 00 00 00 (). 0... 00 00 00 0.0..0.0 00 0. 00 0....0 00 00 0 0...0. 0.0.0..... 0... sd

Table Three upper tail percentiles of the standardized D. d is the upper 00 percentile of the standardized D distribution. 00 d values were generated for each pair of k and n and their mean, standard deviation (sd), and skewness measure were computed. n/k = refers to the standard normal distribution. mean n n/k = 0. = = = 0. = = k =... 0. (=)..0.0.0...0.. 0... k = 0.0.0. 0. (=0)... 0...0..0...0..0..0.0 0 k = -0. -0.. 00 00 0. (=)..0.00 0...0. 0.0.00..0.0.. 0.0..... 0.0..... We denote by d the upper 00 -percentile of the curve of the standardized D and by z for the standard normal curve. The d values are listed in Tables and for =,, 0.. We can see in Table that the d values appear, as expected, farther away from z as the skewness index (γ) increases unless the sample size is too small relative to the table size (k). For example, when k = 0, we see in Table that the d values change abruptly as n comes down from to ; and a similar phenomenon takes place, when k =, as n comes down from to and. In particular, d, d, and d 0. are all equal to.00 when (k, n) = (, ). It is due to an extreme sparseness of the contingency table. In the Monte Carlo experiment, all the 00 distributions that were generated under this sparseness yielded the same set of upper 0., 0., 0., 0., 0., 0., 0.,, percentile points, -0. for the first six points and.00 for the rest three. -0. corresponds to the case that the cell frequencies are at most and.00 corresponds to the case that only one cell frequency is and the others are at most. We will call such a phenomenon as this due to an extreme sparseness an hyper-sparseness phenomenon. The skewness index (γ) is not given in Table since different sets of p i s may pro- sd

duce different index values for fixed values of k and n. However, we can see in the table that the d values are larger on average for asymmetric multinomials than for symmetric multinomials. This phenomenon is a reflection of the expression (). We also see another hyper-sparseness phenomenon in Table at the row (k, n) = (, ). CONCLUDING REMARKS Large scale modelling is not unusual nowadays in many research fields such as bio-sciences, cognitive science, management sciences, etc. In the simulation experiments, we considered Bayesian network models which are a useful tool for representing causal relationship among variables. When the variables are not causally related, log-linear modelling is an appropriate method for contingency table data. Whether it is a Bayesian network model or a log-linear model, we can express a model in a factorized form. Suppose that a model for a set of categorical random variables, X v =Y, v V, is factorized as P (x) P A (x A ) () A V where V is a set of subsets of V. We will denote the marginal table on the subset of variables, X v, v A, for A V, by τ A and call it the configuration on A. Then we can say that the configurations, τ A, A V, are sufficient for the parameters of the model expressed in (). This means that a large sparse contingency table may not cause a trouble in parameter estimation as long as none of the configurations are sparse. In this respect, the statistic, L r, which is a function of multivariate moments of the variables X v, v V, is a reasonable one for goodness-of-fit testing when the configurations in () satisfy that A r. This is one of the reasons why the performance of the L r test is subject to the models which are selected for the goodness-of-fit testing with a given contingency table. The key idea behind the D method for goodness-of-fit testing is the same as for the goodness-of-fit testing using the Pearson chi-square statistic, X, and for the testing using the likelihood ratio statistic (G ). The latter two statistics represent a measure of distance between the data and the model which is selected by a model builder, where the measure is standardized by a chi-squared distribution. In the proposed D method, we construct a reference distribution, instead of using the chi-squared distribution, to create a distance measure which is similar to the P-value of a test, such as p x and p d in (). The simulation experiment strongly recommends to use the D test (i.e., p d values), in comparison with the X test, for goodness-of-fit testing with a large sparse contingency table due to its high sensitivity to the sample size and model discrepancy.

This observation leads us to recommend using both of the D and the L r tests when the contingency table is large and sparse. Not knowing the true model for a given contingency table, it is desirable to use both of the tests, since (i) the L r test seems to be more powerful than the D test when A r and the contingency table is of J type and not too sparse, (ii) the D test seems to be more powerful when A > r, and (iii) as the contingency table becomes sparser, the L r test becomes unstable while the D test remains stable. APPENDIX A: PROOF OF THEOREM The proof is a straightforward application of algebra. V (D ) = V (X ) X ni ) = V (X ) + V (X ni ) X Cov( n i, X ). () (X nx nxx ni pi q i V ) = m 0i i j nx p i p j p + i. () m 0j m 0i As for the last term in (), we have After a simple algebra, we have E(X ) = nx pi Cov( n i, X ) = E(X n i ) E(X )E( n i ). () q i + n p i n, () E(X n i p ) = n(n )X i p j p + n(n )(n m 0j )X i p j n p i m 0j i j +np i + (n )p i + (n )(n )p i m 0i i j. () Substituting equations () and () into () and then () and () into equation () yields the desired result of the theorem.

APPENDIX B: DAG MODEL OF CATEGORICAL VARIABLES A DAG model of a set of variables, X v, v V, is a probability model whose model structure can be represented by a directed acyclic graph, G say, which consists of vertices and directed edges or arrows between vertices. For a subset A of V, X A = (X v ) v A, and for two subsets A and B of V, we denote the conditional probability P (X B = x B X A = x A ) by P B A (x B x A ). If there is an arrow from vertex u to v, we call u a parent vertex of v. Since a DAG is acyclic, every vertex in a DAG has a unique set of parent vertices. We denote by pa(v) the set of the parent vertices of v. Then we can represent the joint probability of (X v ) v V as the product of conditional probabilities as in P (X V = x V ) =Y v V where pa(v) = if vertex v has no parent vertex. P v pa(v) (x v x pa(v) )

APPENDIX C: COMPARISON OF π x AND π d VALUES Solid curves with circles ( ) are for π d and dashed curves with bullets ( ) for π x. The dotted vertical line represents that =..0 Model 0: n=0.0 Model 0: n=00 π() π() 0. 0. 0..0 0..0.0 Model : n=0.0 Model=: n=0 π() π() 0. 0. 0..0 0..0.0 Model : n=00.0 Model=: n=00 π() π() 0. 0. 0..0 0..0.0 Model : n=00.0 Model=: n=00 π() π() 0. 0. 0..0 0..0

.0 Model : n=0.0 Model : n=0 π() π() 0. 0..0 0..0 Model : n=00.0 0..0 Model : n=00 π() π() 0. 0. 0..0 0..0.0 Model : n=00.0 Model : n=00 π() π() 0. 0. 0..0 0..0 References Agresti, A., 0. Categorical Data Analysis. John Wiley & Sons: New York. Berry, K.J., Mielke, Jr. P.W.,. Monte Carlo comparisons of the asymptotic Chi-square and likelihood-ratio tests with the nonasymptotic Chi-square test for sparse r c tables. Psychological Bulletin 0(), -. Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.,. Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge. Cochran, W.G., The χ test of goodness-of-fit. Ann. Math. Statist., -. Cressie, N., Read, T.R.C.,. Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B (), 0-. Fienberg, S.E.,. The use of Chi-squared statistics for categorical data problems. J. Roy. Statist. Soc. Ser. B, -. Horn, S.D.,. Goodness-of-fit tests for discrete data: A review and an application to a health impairment scale. Biometrics, -.

Johnson, N.L., Kotz, S.,. Distributions in Statistics: Discrete Distributions. Houghton Mifflin Company, New York. Koehler, K.J.,. Goodness-of-fit tests for log-linear models in sparse contingency tables. J. Amer. Statist. Assoc. (), -. Koehler, K.J., Larntz, K., 0. An empirical investigation of goodness-of-fit statistics for sparse multinomials. J. Amer. Statist. Assoc. (0), -. Lancaster, H.O.,. The Chi-squared Distribution. Wiley, New York. Maydeu-Olivares, A., Joe, H., 00. Limited- and full-information estimation and goodness-of -fit testing in n contingency tables: a unified framework. J. Amer. Statist. Assoc. 00(), 00-00. Mielke, Jr. P.W.,. Some exact and nonasymptotic analyses of discrete goodness-of-fit and r-way contingency tables. In: Johnson NL, Balakrishman N (Eds), Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz, Wiley, New York, -. Mielke, Jr. P.W., Berry, K.J.,. Non-asymptotic inferences based on the chisquare statistic for r by c contingency tables. J. Statist. Plann. Inference, -. Mislevy, R.J.,. Evidence and inference in educational assessment. Psychometrika (), -. Moore, D.S., Recent developments in chi-square tests for goodness-of-fit. Mimeo. (Dept. of Statistics, Purdue University, Lafayette, IN, ). Pearson, K., 00. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophy Magazine Series 0(), -. Read, T.R.C.,. Small-sample comparisons for the power divergence goodnessof-fit statistics. J. Amer. Statist. Assoc. (), -. Watson, G.S.,. Some recent results in chi-square goodness-of-fit tests. Biometrics, 0-. Wermuth, N., Lauritzen, S.L.,. Graphical and recursive models for contingency tables. Biometrika 0(), -. Zelterman, D.,. Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. (), -.