Applied Mathematics Research Report 07-08

Size: px
Start display at page:

Download "Applied Mathematics Research Report 07-08"

Transcription

1 Estimate-based Goodness-of-Fit Test for Large Sparse Multinomial Distributions by Sung-Ho Kim, Heymi Choi, and Sangjin Lee Applied Mathematics Research Report 0-0 November, 00 DEPARTMENT OF MATHEMATICAL SCIENCES

2 Estimate-based Goodness-of-Fit Test for Large Sparse Multinomial Distributions Sung-Ho Kim a,, Hyemi Choi b, Sangjin Lee a a Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, 0-0, South Korea b Department of Statistical Informatics, Chonbuk National University, Chonju, -, South Korea Abstract The Pearson s chi-squared statistic (X ) does not in general follow a chi-square distribution when it is used for goodness-of-fit testing for a multinomial distribution based on sparse contingency table data. We explore properties of Zelterman s () D statistic and compare them with those of X and we also compare these two statistics and the statistic (L r ) which is proposed by Maydeu-Olivares and Joe (00) in the context of power of the goodness-of-fit testing when the given contingency table is very sparse. We show that the variance of D is not larger than the variance of X under null hypotheses where all the cell probabilities are positive, that the distribution of D becomes more skewed as the multinomial distribution becomes more asymmetric and sparse, and that, as for the L r statistic, the power of the goodness-of-fit testing depends on the models which are selected for the testing. A simulation experiment strongly recommends to use both D and L r for goodness-of-fit testing with large sparse contingency table data. Key words: Test power, P-value, Sample-cell ratio, Model discrepancy, Normal approximation, Skewness Introduction A variety of statistics have been derived for goodness-of-fit testing such as the Pearson chi-squared statistic (X ) (Pearson, 00), the log likelihood ratio statistic (G ), the Freeman-Tukey statistic, the Neyman modified X, and the modified G. Corresponding author. addresses: shkim@amath.kaist.ac.kr (Sung-Ho Kim), hchoi@chonbuk.ac.kr ( Hyemi Choi), Lee.Sang.jin@kaist.ac.kr ( Sangjin Lee). Preprint submitted to Elsevier Preprint November 00

3 Cressie and Read () proposed a family of power divergence statistics, in which all of the above statistics are embedded. All of these statistics are asymptotically chi-squared distributed with appropriate degrees of freedom (e.g. Fienberg, ; Horn, ; Lancaster, ; Moore, ; Watson, ). Two most popular statistics among them are X and G. Cochran () gives a nice review of the early development of X and concludes that there is little to distinguish it from G. The chi-squared approximation to X and G was studied by Read (), and the accuracy of these approximations was examined by Koehler and Larntz (0) and Koehler (). Koehler () demonstrated asymptotic normality of G for testing the fit of log-linear models with closed form maximum likelihood estimates in sparse contingency tables and showed that the normal approximation can be much more accurate than the chi-squared approximation for G, but he admitted that the bias of the estimated moments is a potential problem for very sparse tables. A good review on goodness-of-fit tests for sparse contingency tables is provided in Koehler (). Zelterman () proposed a statistic D and compared it with X for testing goodness-of-fit (GOF) to sparse multinomial X distributions, where (ni /e i ), D = X where n i is the count in the ith cell, e i the estimate of the cell mean of the ith cell, andp is the summation over all the cells such that e i > 0. The D statistic is not a member of the family of Cressie and Read s power divergence statistics. The goal of this paper is to explore properties of the goodness-of-fit test using D (or the D test for short), in comparison with the goodness-of-fit test using X (or the X test for short), other than the asymptotic properties as derived in Zelterman (), to do a more rigorous comparative study on the goodness-of-fit tests using the statistics, D, X, and L r, in the context of power, and to propose an approach to goodness-of-fit testing for a very sparse multinomial distribution. This paper consists of six sections. Section describes what happens to X when the data size n is smaller than the number of cells (k) of a given contingency table and summarizes earlier works on asymptotic properties of X with more attention paid to the case that n < k. Section compares D and X in the context of variance, and it is shown that the variance of D is not larger than that of X under any null hypothesis of multinomial distribution. We carry out a Monte Carlo study in Section to compare the goodness-of-fit tests by using D, X, and L r in the context of test power. The Monte Carlo result strongly suggests that the D test is at least as good as the X test when n < k and comparable with the L r test. We then investigated normal approximation to the standardized D curve in Section. Section concludes the paper with some further remarks on the goodness-of-fit testing for sparse contingency tables.

4 A BRIEF REVIEW ON GOODNESS-OF-FIT TESTS USING X AND SOME RELATED STATISTICS Suppose that we have a contingency table with k cells and n observations, and consider a goodness-of-fit testing of a symmetric null hypothesis, i.e., all the cell probabilities are the same. We will, first of all, consider the two extreme cases that n = and n =. Under the symmetric null hypothesis, X = k when n = ; and when n =, it is either k or k/ according to whether a particular cell frequency equals or not. It is interesting to note that when n < k, X is strictly positive under any hypothesis for the contingency table. This simply shows that the asymptotic distribution of X can not be a chi-square distribution when n < k. This is one of the reasons that the normal approximation was sought for in the literature when n < k. While the literature on the asymptotic properties of X abounds in regard to contingency tables with large enough data (e.g., Agresti (0); Bishop et al. ()), a unifying result is yet to be explored as for the goodness-of-fit testing for sparse contingency tables. Results from the literature concerning Pearson s statistics are summarily listed below in regard to sparse multinomial problems. (i) The X test has some optimal local power properties in the symmetrical case when the number of cells is moderately large. Hence the X test based on the traditional chi-squared approximation is preferred for the test of symmetry. For the null hypothesis of symmetry, the chi-squared approximation for X is quite adequate at the and nominal levels for expected frequencies as low as 0. when k, n 0, n /k 0 (Koehler and Larntz, 0, p. ) (ii) The chi-squared approximation for X produces inflated rejection levels for asymmetrical null hypotheses that contain many expected frequencies smaller than (Koehler and Larntz, 0, p. ). (iii) The adverse effects of small expected frequencies on the chi-squared approximation are generally less severe for X than for other commonly used goodness-of-fit statistics (Koehler,, p. ). (iv) The asymptotic power of the D test is moderate and at least as great as that of the normalized X (Zelterman,, p. ). (v) For the symmetric null hypothesis, choosing the value of the family parameter λ of Cressie and Read s () power divergence family of statistics in [, ] results in a test statistic whose size can be approximated accurately by a chisquared distribution, provided that k and n 0. When λ [, ] \ [, ], the corrected chi-square approximation F C is recommended where F C is the corrected chi-square distribution for which the mean and variance agree to second order with the mean and variance of Cressie and Read s power divergence statistics, since it gives reasonably accurate approximate levels and is easy to compute. When both n and k are large (say, n > 00), then the

5 normal approximation should be used (Read,, p. ). (vi) Berry and Mielke () proposed non-asymptotic chi-square tests based on X and D that use Pearson Type III distribution approximation for testing large sparse contingency tables. (vii) Maydeu-Olivares and Joe (00) proposed a statistic, L r, for goodness-of-fit testing using J contingency table data which is given in a quadratic form based on multivariate moments up to order r and showed that X belongs to this family of L r. They showed that when r is small (e.g., or ), the L r has better small-sample properties and are asymptotically more powerful than X for some multivariate binary models. COMPARISON OF VARIANCES The favorable comparison (Zelterman (, p. )) of D over the normalized version of X in the context of asymptotic power in goodness-of-fit tests may be extended to the situations of small or moderate data sizes. To see this, we will have a closer look at the two statistics and compare their exact means and variances across multinomial distributions, which will give us an insight into the power comparison. Suppose that the cell probability at the ith cell, p i, is equal to p 0i under the null hypothesis (H 0 ) and let it be equal to p ai under an alternative hypothesis (H a ). We will denote the cell means in the same manner, i.e., by under H 0 and by m ai under H a. Throughout the rest of the paper, we will assume thatqi p 0i > 0. We can then re-express D and X as kx X X = D = X i= (n i ) /, i n i /. We list a couple of simple facts: kx E(X X ) = E(D ) = E(X ) i= np i q i /. () i p i /p 0i. () The difference between the variance of D and that of X is given in

6 Theorem Under a multinomial assumption with the sample size n and a contingency table of k cells, we have n kx kx V (D ) = V (X ) + i= j= +n(n ) kx kx kx = V (X ) + i= i= j= p i p j m 0j p i p j m 0j p i + (n )p i kx Ž kx Ž m 0i kx i= p i i= m 0i pj j= p i Œ /k. p 0j p 0i Proof: See Appendix A. A couple of corollaries follow from this theorem. Corollary Under H 0, V (D ) is simplified as V (D ) = V (X ) + k n kx V (X ), i= where the equality holds if and only if H 0 is symmetrical. Corollary Suppose that p ah =, for some h k under an H a. Then under the H a, we have V (D ) = V (X ). () Let δ = E(D ) E(X ) and δ = V (D ) V (X ). Corollary implies that δ < 0 in a near neighborhood of (p 0,..., p 0k ) when H 0 is not symmetrical. And judging from Corollary, we can see that δ is bounded on the boundary of the k dimensional simplex of (p,..., p k ) since δ is continuous in (p,..., p k ). SIMULATION EXPERIMENTS To compare the X and D tests, we conducted simulation experiments with small samples, using a model of 0 binary variables whose model structure is represented as a directed acyclic graph (DAG). We will call this model Model 0 and denote it by M 0. Ten alternative models are considered for the goodness of fit test, which are determined by adding or deleting some edges from Model 0. They are displayed

7 in Figure along with the true model, M 0, where each vertex represents a binary variable. A brief explanation of the DAG models is given in Appendix B. For each model, sample sizes n = 0, 00, 00 are selected such that n/k is close to /0, /, /. For each of the ten alternative models, say M, we obtain an approximate value of the p-value of the goodness-of-fit test as follows: (i) Generate a sample {(n (0),..., n (0) k ) : Pk i= n (0) i = n} of size n from M 0. (ii) Estimate the cell probabilities for model M based on the sample values, n (0),, n (0) k. (iii) Generate a sample of size n from model M based on the estimates of the cell probabilities that are obtained in (ii) and compute X and D using the sample values and the estimates obtained in (ii). (iv) Repeat (iii) until t X and D values are obtained, and arrange the t X values in an increasing sequence of x (),, x (t) and similarly the D values into the sequence of d (),, d (t). (v) Compute the values of X and D using the sample values, n (0),, n (0) k, and the estimates obtained in (ii), and then obtain the values of p x (X ) and p d (X ) which are defined below in expression (). 0 Model 0 Model 0 Model 0 0 Model Model 0 Model 0 0 Model Model 0 Model 0 Model 0 Model 0 0 Fig.. Graphs of DAG Models, Models 0(M 0 ), (M ) through 0(M 0 ). Models M i, i =,,, 0, are constructed by removing (dashed line) or adding (dot-dashed line) arrows to M 0.

8 ><>: t+ if X x (t) We define a pseudo p-value of X, p x (X ), as p x (X ) = i if x t+ (i) X < x (i+), i < t. () 0. t+ if X < x () A pseudo p-value of D, p d (D ), is defined similarly as above except that X and x are replaced with D and d, respectively. As is apparent in (), the pseudo p-value gets close to the actual p-value as t increases, where the actual p-value is for the goodness-of-fit test of model M. If X x (), we have 0 p x (the actual p-value) < t +, and p x (the actual p-value) < 0. t + otherwise. The same inequalities hold for p d also. To compare the power of goodness-of-fit test between D and X, we repeated the above procedure, (i) through (v), 00 times to obtain as many values of each of p x and p d with t = 00. Table lists the proportions that p x (p d ) values are less than {, 0.} for three sample sizes(n) 0, 00, and 00. In the table, we can see that the D test is more powerful than the X test. Model M 0 is of 0 binary variables, which means a contingency table of 0 =, 0 cells. So the sample-cell ratios (n/k) considered in the table are 0/0, 00/0 = 0., and 00/0 =. The power increases as the sample size increases for both of the D and the X tests. However, it is interesting to see that the power does not change noticeably for some models with the X test. As for models M and M, the proportions that p x is not larger than 0. while those for p d increase up to and 0., respectively. We can see the same phenomenon for the two models when = 0.. We can see in the figures in Appendix C that p x is less sensitive to the sample size than p d. We will denote by π x () the proportion that p x out of the 00 p x values and similarly for π d (). From the π values of models M, M, and M, we can see that the π values do not change monotonically in the number of deleted edges from model M 0. It is indicated in the table that the π value is sensitive to how the edge deletion affects the inter-relationship among the variables in the model. For example, in model M, variables X, X, X, and X are independent of the other variables in the model, while no variable is isolated from the rest of the variables in models M and M. In Table, we can also see that the X is less sensitive, than the D test, to differences in model structure. For example, model M is smaller than model M by three

9 edges and this difference in model structure is reflected in π d (), ( =, 0.), while π x () values remain more or less the same. One may expect that any addition of edges between the set of variables, X, X, X, X and the set of the remaining variables in model M might decrease the π value. But this is not the case as the π values indicate for models M and M. We see almost no difference in the π values between the two models. It is interesting to note that connecting the two sets of variables by adding inappropriate edges which are not found in the true model M 0 does not affect the π value. The same phenomenon occurs for models M and M. The simulation result strongly recommends the D test in comparison with the X test in the sense that the D test is more sensitive to the sample size and the difference of the model structure between the true model and a selected model. Table The proportions of the p x and p d values that are less than or equal to {, 0.} out of 00 replications for the p x and p d values. The values of the proportions are rounded to the two decimals. = Models D X Add Delete Model n = 0 n = 00 n = 00 n = 0 n = 00 n = = 0. Model n = 0 n = 00 n = 00 n = 0 n = 00 n =

10 However, when the sample size is too small (e.g., n = 0) relative to the cell size (e.g., k =, 0), the D test may not be very useful unless our interested models are very different from the true model M 0 as the models M, M, and M 0. In such a small-sample situation, it is desirable that we use both of the D and the L r tests. A comparison between these tests follows below. Maydeu-Olivares and Joe (00) suggest using L r for r =,, for large and sparse J tables. From an additional simulation experiment, where we used D and L r for testing the goodness-of-fit of the set of the models in Figure, we could see that the D test is more powerful than the L r test for some of the models in the figure and vice versa for the others in the figure. When the sample size was as small as, the power of the L r test became unstable while it was not the case as for the D test. For instance, the proportion that the p-value of the goodness-of-fit test using L is less than was,, and for the models, M, M, M 0, respectively, out of 00 iterations, while they are 0.0,, and 0. with the D test. Considering the structural discrepancies of these three models from M 0 (see the first three columns in Table ), the proportions from the D test reflect the structural discrepancies better than the L test when n =. The proportions from the L test were 0 or close to zero for all the (M i, n) combinations with i = 0,,, 0, n =, 0, 00, 00 except the combinations, (M, 00) and (M, 00), for which the proportions were and 0. respectively. THE DISTRIBUTION OF THE STANDARDIZED D Although the D test is preferred, when the sample-cell ratio (n/k) is less than, to the X test in the goodness-of-fit testing due to its sensitivity to the sample size and the discrepancy between model structures, it is important to investigate the dependency of its distribution upon the sample-cell ratio and the level of asymmetry (or non-uniformity) of the cell probabilities, p,, p k, which is defined by λ = kx i= (p i k ). The first three moments of X and D are obtained under H 0 in Horn () and Mielke and Berry (), respectively. The mean and the variance of D are given under H 0, respectively, by E 0 (D ) = and V 0 (D ) = (k )( n ),

11 which have nothing to do with the distribution under H 0, while kx V 0 (X ) = (k )( n ) k n + i= is not (see the equation in Corollary.) But the skewnesses of X and D under H 0 are dependent upon the distribution under H 0 (e.g. Mielke and Berry,, p. ). According to Mielke and Berry (), the skewness of D is given by γ =s n n k +s n n {( n ) ( n )k }( k ) (k ) kx i= p i /n. () Denote the two terms in the right hand side of () by A and B, respectively. Then =Ê n O A + k k + + k kn O O + () k n and, as for B, we will consider asymmetric distributions of p i s as given by, for 0 < ɛ < and g < k, Then we have p = = p g = ɛ k and p g+ = = p k = kx i= k ɛg k(k g). () p i = g k ɛ + (k g) ɛg/k. () For the cell probabilities in (), the level of asymmetry is given by λ = g( ɛ) k(k g). () From (), (), and (), we can see that as g, g k, increases and ɛ, 0 < ɛ <, becomes smaller, both γ and λ increases. As a matter of fact, γ increases as λ increases. In particular, cê when s(s > ) is chosen so that g = k/s is an integer and o k and n satisfy k = cn for some real c, we have B = k + + c + ɛ(s ) +. (0) k ɛ(s ɛ) k Note that when ɛ =, we have p = = p k = /k. Under the distribution as 0

12 given by (), we have from () and (0) that γ =Ê k + c k + c + + c k «o + ɛ(s ) +. () ɛ(s ɛ) k The last expression implies that as we have more small p i s (that is, a smaller s (s > ) with a small ɛ value), the skewness (γ) of D gets larger. In other words, as there are more small p i s, the normal approximation to D gets worse at the tail areas. This phenomenon is observed in Tables and. It is also noteworthy in expression () that as the contingency table becomes sparser (i.e., c = k/n gets larger), the distribution of D gets more skewed. Table is obtained under symmetric multinomial distributions for k = 0,, with varying sample sizes (n) as given in the table. Under each symmetric distribution, 0,000 D values were generated and percentiles were obtained from them, and after repeating this 00 times, the means and the variances of the percentiles were obtained. Table is obtained from the same Monte Carlo experiment as for Table except that a variety of multinomial distributions were used, where each distribution is defined by the vector of the cell probabilities (p,, p k ). Here, each p i was obtained as a ratio given by u i /Pk j= u j where the random quantities, u j s, were confined to lie uniformly between and 0. to avoid too small cell probabilities. Table Three upper tail percentiles of the standardized D under symmetric multinomials. d is the upper 00 percentile of the standardized D distribution. 00 d values were generated for each pair of k and n and their mean, standard deviation (sd), and skewness measure were computed. n/k = refers to the standard normal distribution. γ is the skewness value as given by (). mean n n/k γ = 0. = = = 0. = = k = (0) k = () sd

13 Table Three upper tail percentiles of the standardized D. d is the upper 00 percentile of the standardized D distribution. 00 d values were generated for each pair of k and n and their mean, standard deviation (sd), and skewness measure were computed. n/k = refers to the standard normal distribution. mean n n/k = 0. = = = 0. = = k = (=) k = (=0) k = (=) We denote by d the upper 00 -percentile of the curve of the standardized D and by z for the standard normal curve. The d values are listed in Tables and for =,, 0.. We can see in Table that the d values appear, as expected, farther away from z as the skewness index (γ) increases unless the sample size is too small relative to the table size (k). For example, when k = 0, we see in Table that the d values change abruptly as n comes down from to ; and a similar phenomenon takes place, when k =, as n comes down from to and. In particular, d, d, and d 0. are all equal to.00 when (k, n) = (, ). It is due to an extreme sparseness of the contingency table. In the Monte Carlo experiment, all the 00 distributions that were generated under this sparseness yielded the same set of upper 0., 0., 0., 0., 0., 0., 0.,, percentile points, -0. for the first six points and.00 for the rest three. -0. corresponds to the case that the cell frequencies are at most and.00 corresponds to the case that only one cell frequency is and the others are at most. We will call such a phenomenon as this due to an extreme sparseness an hyper-sparseness phenomenon. The skewness index (γ) is not given in Table since different sets of p i s may pro- sd

14 duce different index values for fixed values of k and n. However, we can see in the table that the d values are larger on average for asymmetric multinomials than for symmetric multinomials. This phenomenon is a reflection of the expression (). We also see another hyper-sparseness phenomenon in Table at the row (k, n) = (, ). CONCLUDING REMARKS Large scale modelling is not unusual nowadays in many research fields such as bio-sciences, cognitive science, management sciences, etc. In the simulation experiments, we considered Bayesian network models which are a useful tool for representing causal relationship among variables. When the variables are not causally related, log-linear modelling is an appropriate method for contingency table data. Whether it is a Bayesian network model or a log-linear model, we can express a model in a factorized form. Suppose that a model for a set of categorical random variables, X v =Y, v V, is factorized as P (x) P A (x A ) () A V where V is a set of subsets of V. We will denote the marginal table on the subset of variables, X v, v A, for A V, by τ A and call it the configuration on A. Then we can say that the configurations, τ A, A V, are sufficient for the parameters of the model expressed in (). This means that a large sparse contingency table may not cause a trouble in parameter estimation as long as none of the configurations are sparse. In this respect, the statistic, L r, which is a function of multivariate moments of the variables X v, v V, is a reasonable one for goodness-of-fit testing when the configurations in () satisfy that A r. This is one of the reasons why the performance of the L r test is subject to the models which are selected for the goodness-of-fit testing with a given contingency table. The key idea behind the D method for goodness-of-fit testing is the same as for the goodness-of-fit testing using the Pearson chi-square statistic, X, and for the testing using the likelihood ratio statistic (G ). The latter two statistics represent a measure of distance between the data and the model which is selected by a model builder, where the measure is standardized by a chi-squared distribution. In the proposed D method, we construct a reference distribution, instead of using the chi-squared distribution, to create a distance measure which is similar to the P-value of a test, such as p x and p d in (). The simulation experiment strongly recommends to use the D test (i.e., p d values), in comparison with the X test, for goodness-of-fit testing with a large sparse contingency table due to its high sensitivity to the sample size and model discrepancy.

15 This observation leads us to recommend using both of the D and the L r tests when the contingency table is large and sparse. Not knowing the true model for a given contingency table, it is desirable to use both of the tests, since (i) the L r test seems to be more powerful than the D test when A r and the contingency table is of J type and not too sparse, (ii) the D test seems to be more powerful when A > r, and (iii) as the contingency table becomes sparser, the L r test becomes unstable while the D test remains stable. APPENDIX A: PROOF OF THEOREM The proof is a straightforward application of algebra. V (D ) = V (X ) X ni ) = V (X ) + V (X ni ) X Cov( n i, X ). () (X nx nxx ni pi q i V ) = m 0i i j nx p i p j p + i. () m 0j m 0i As for the last term in (), we have After a simple algebra, we have E(X ) = nx pi Cov( n i, X ) = E(X n i ) E(X )E( n i ). () q i + n p i n, () E(X n i p ) = n(n )X i p j p + n(n )(n m 0j )X i p j n p i m 0j i j +np i + (n )p i + (n )(n )p i m 0i i j. () Substituting equations () and () into () and then () and () into equation () yields the desired result of the theorem.

16 APPENDIX B: DAG MODEL OF CATEGORICAL VARIABLES A DAG model of a set of variables, X v, v V, is a probability model whose model structure can be represented by a directed acyclic graph, G say, which consists of vertices and directed edges or arrows between vertices. For a subset A of V, X A = (X v ) v A, and for two subsets A and B of V, we denote the conditional probability P (X B = x B X A = x A ) by P B A (x B x A ). If there is an arrow from vertex u to v, we call u a parent vertex of v. Since a DAG is acyclic, every vertex in a DAG has a unique set of parent vertices. We denote by pa(v) the set of the parent vertices of v. Then we can represent the joint probability of (X v ) v V as the product of conditional probabilities as in P (X V = x V ) =Y v V where pa(v) = if vertex v has no parent vertex. P v pa(v) (x v x pa(v) )

17 APPENDIX C: COMPARISON OF π x AND π d VALUES Solid curves with circles ( ) are for π d and dashed curves with bullets ( ) for π x. The dotted vertical line represents that =..0 Model 0: n=0.0 Model 0: n=00 π() π() Model : n=0.0 Model=: n=0 π() π() Model : n=00.0 Model=: n=00 π() π() Model : n=00.0 Model=: n=00 π() π()

18 .0 Model : n=0.0 Model : n=0 π() π() Model : n= Model : n=00 π() π() Model : n=00.0 Model : n=00 π() π() References Agresti, A., 0. Categorical Data Analysis. John Wiley & Sons: New York. Berry, K.J., Mielke, Jr. P.W.,. Monte Carlo comparisons of the asymptotic Chi-square and likelihood-ratio tests with the nonasymptotic Chi-square test for sparse r c tables. Psychological Bulletin 0(), -. Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.,. Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge. Cochran, W.G., The χ test of goodness-of-fit. Ann. Math. Statist., -. Cressie, N., Read, T.R.C.,. Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B (), 0-. Fienberg, S.E.,. The use of Chi-squared statistics for categorical data problems. J. Roy. Statist. Soc. Ser. B, -. Horn, S.D.,. Goodness-of-fit tests for discrete data: A review and an application to a health impairment scale. Biometrics, -.

19 Johnson, N.L., Kotz, S.,. Distributions in Statistics: Discrete Distributions. Houghton Mifflin Company, New York. Koehler, K.J.,. Goodness-of-fit tests for log-linear models in sparse contingency tables. J. Amer. Statist. Assoc. (), -. Koehler, K.J., Larntz, K., 0. An empirical investigation of goodness-of-fit statistics for sparse multinomials. J. Amer. Statist. Assoc. (0), -. Lancaster, H.O.,. The Chi-squared Distribution. Wiley, New York. Maydeu-Olivares, A., Joe, H., 00. Limited- and full-information estimation and goodness-of -fit testing in n contingency tables: a unified framework. J. Amer. Statist. Assoc. 00(), Mielke, Jr. P.W.,. Some exact and nonasymptotic analyses of discrete goodness-of-fit and r-way contingency tables. In: Johnson NL, Balakrishman N (Eds), Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz, Wiley, New York, -. Mielke, Jr. P.W., Berry, K.J.,. Non-asymptotic inferences based on the chisquare statistic for r by c contingency tables. J. Statist. Plann. Inference, -. Mislevy, R.J.,. Evidence and inference in educational assessment. Psychometrika (), -. Moore, D.S., Recent developments in chi-square tests for goodness-of-fit. Mimeo. (Dept. of Statistics, Purdue University, Lafayette, IN, ). Pearson, K., 00. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophy Magazine Series 0(), -. Read, T.R.C.,. Small-sample comparisons for the power divergence goodnessof-fit statistics. J. Amer. Statist. Assoc. (), -. Watson, G.S.,. Some recent results in chi-square goodness-of-fit tests. Biometrics, 0-. Wermuth, N., Lauritzen, S.L.,. Graphical and recursive models for contingency tables. Biometrika 0(), -. Zelterman, D.,. Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. (), -.

Markovian Combination of Decomposable Model Structures: MCMoSt

Markovian Combination of Decomposable Model Structures: MCMoSt Markovian Combination 1/45 London Math Society Durham Symposium on Mathematical Aspects of Graphical Models. June 30 - July, 2008 Markovian Combination of Decomposable Model Structures: MCMoSt Sung-Ho

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution Journal of Computational and Applied Mathematics 216 (2008) 545 553 www.elsevier.com/locate/cam Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION Far East Journal of Theoretical Statistics Volume 28, Number 2, 2009, Pages 57-7 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House SIMULATED POWER OF SOME DISCRETE GOODNESS-

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables American Journal of Biostatistics (): 7-, 00 ISSN 948-9889 00 Science Publications Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables Kouji Yamamoto, Kyoji Hori and Sadao Tomizawa

More information

Goodness of Fit Test and Test of Independence by Entropy

Goodness of Fit Test and Test of Independence by Entropy Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi

More information

Pseudo-score confidence intervals for parameters in discrete statistical models

Pseudo-score confidence intervals for parameters in discrete statistical models Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

An Approximate Test for Homogeneity of Correlated Correlation Coefficients Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Causal Models with Hidden Variables

Causal Models with Hidden Variables Causal Models with Hidden Variables Robin J. Evans www.stats.ox.ac.uk/ evans Department of Statistics, University of Oxford Quantum Networks, Oxford August 2017 1 / 44 Correlation does not imply causation

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

A homogeneity test for spatial point patterns

A homogeneity test for spatial point patterns A homogeneity test for spatial point patterns M.V. Alba-Fernández University of Jaén Paraje las lagunillas, s/n B3-053, 23071, Jaén, Spain mvalba@ujaen.es F. J. Ariza-López University of Jaén Paraje las

More information

A bias-correction for Cramér s V and Tschuprow s T

A bias-correction for Cramér s V and Tschuprow s T A bias-correction for Cramér s V and Tschuprow s T Wicher Bergsma London School of Economics and Political Science Abstract Cramér s V and Tschuprow s T are closely related nominal variable association

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV

APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV DOI 10.1007/s11018-017-1213-4 Measurement Techniques, Vol. 60, No. 5, August, 2017 APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV B. Yu. Lemeshko and T.

More information

Journal of Forensic & Investigative Accounting Volume 10: Issue 2, Special Issue 2018

Journal of Forensic & Investigative Accounting Volume 10: Issue 2, Special Issue 2018 Detecting Fraud in Non-Benford Transaction Files William R. Fairweather* Introduction An auditor faced with the need to evaluate a number of files for potential fraud might wish to focus his or her efforts,

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions

Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions Lei Zhang 1, Hongmei Han 2, Dachuan Zhang 3, and William D. Johnson 2 1. Mississippi State Department of Health,

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE

GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE DOI 10.1007/s11018-017-1141-3 Measurement Techniques, Vol. 60, No. 1, April, 2017 GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data Journal of Data Science 18(2018), 129-146 Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data Yue Fang China Europe International Business School, Shanghai,

More information

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions Hong Kong Baptist University HKBU Institutional Repository Department of Economics Journal Articles Department of Economics 1998 Testing for a unit root in an ar(1) model using three and four moment approximations:

More information

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE Communications in Statistics-Theory and Methods 33 (4) 1715-1731 NEW APPROXIMATE INFERENTIAL METODS FOR TE RELIABILITY PARAMETER IN A STRESS-STRENGT MODEL: TE NORMAL CASE uizhen Guo and K. Krishnamoorthy

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Estimation and sample size calculations for correlated binary error rates of biometric identification devices Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

A simple graphical method to explore tail-dependence in stock-return pairs

A simple graphical method to explore tail-dependence in stock-return pairs A simple graphical method to explore tail-dependence in stock-return pairs Klaus Abberger, University of Konstanz, Germany Abstract: For a bivariate data set the dependence structure can not only be measured

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Multiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale

Multiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale Multiple Comparison Procedures 1 Multiple Comparison Procedures, Trimmed Means and Transformed Statistics Rhonda K. Kowalchuk Southern Illinois University Carbondale H. J. Keselman University of Manitoba

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

A sequential hypothesis test based on a generalized Azuma inequality 1

A sequential hypothesis test based on a generalized Azuma inequality 1 A sequential hypothesis test based on a generalized Azuma inequality 1 Daniël Reijsbergen a,2, Werner Scheinhardt b, Pieter-Tjerk de Boer b a Laboratory for Foundations of Computer Science, University

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Data Mining 2018 Bayesian Networks (1)

Data Mining 2018 Bayesian Networks (1) Data Mining 2018 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10

More information

Analysis of data in square contingency tables

Analysis of data in square contingency tables Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Farhat Iqbal Department of Statistics, University of Balochistan Quetta-Pakistan farhatiqb@gmail.com Abstract In this paper

More information

TUTORIAL 8 SOLUTIONS #

TUTORIAL 8 SOLUTIONS # TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27 The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell Dennis Boos Department of Statistics University of Florida and Department of Statistics North Carolina State

More information

PAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator FOR THE BINOMIAL DISTRIBUTION

PAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator FOR THE BINOMIAL DISTRIBUTION APPLICATIONES MATHEMATICAE 22,3 (1994), pp. 331 337 W. KÜHNE (Dresden), P. NEUMANN (Dresden), D. STOYAN (Freiberg) and H. STOYAN (Freiberg) PAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary CPSC 531: Random Numbers Jonathan Hudson Department of Computer Science University of Calgary http://www.ucalgary.ca/~hudsonj/531f17 Introduction In simulations, we generate random values for variables

More information

subroutine is demonstrated and its accuracy examined, and the nal Section is for concluding discussions The rst of two distribution functions given in

subroutine is demonstrated and its accuracy examined, and the nal Section is for concluding discussions The rst of two distribution functions given in COMPUTATION OF THE TEST STATISTIC AND THE NULL DISTRIBUTION IN THE MULTIVARIATE ANALOGUE OF THE ONE-SIDED TEST Yoshiro Yamamoto, Akio Kud^o y and Katsumi Ujiie z ABSTRACT The multivariate analogue of the

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview Symmetry 010,, 1108-110; doi:10.3390/sym01108 OPEN ACCESS symmetry ISSN 073-8994 www.mdpi.com/journal/symmetry Review Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency

More information

Chi-Square. Heibatollah Baghi, and Mastee Badii

Chi-Square. Heibatollah Baghi, and Mastee Badii 1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Bootstrap Procedures for Testing Homogeneity Hypotheses

Bootstrap Procedures for Testing Homogeneity Hypotheses Journal of Statistical Theory and Applications Volume 11, Number 2, 2012, pp. 183-195 ISSN 1538-7887 Bootstrap Procedures for Testing Homogeneity Hypotheses Bimal Sinha 1, Arvind Shah 2, Dihua Xu 1, Jianxin

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Estimation of parametric functions in Downton s bivariate exponential distribution

Estimation of parametric functions in Downton s bivariate exponential distribution Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 473-481 Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern Byungjin

More information

How likely is Simpson s paradox in path models?

How likely is Simpson s paradox in path models? How likely is Simpson s paradox in path models? Ned Kock Full reference: Kock, N. (2015). How likely is Simpson s paradox in path models? International Journal of e- Collaboration, 11(1), 1-7. Abstract

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Research on Standard Errors of Equating Differences

Research on Standard Errors of Equating Differences Research Report Research on Standard Errors of Equating Differences Tim Moses Wenmin Zhang November 2010 ETS RR-10-25 Listening. Learning. Leading. Research on Standard Errors of Equating Differences Tim

More information

Generalized Cramér von Mises goodness-of-fit tests for multivariate distributions

Generalized Cramér von Mises goodness-of-fit tests for multivariate distributions Hong Kong Baptist University HKBU Institutional Repository HKBU Staff Publication 009 Generalized Cramér von Mises goodness-of-fit tests for multivariate distributions Sung Nok Chiu Hong Kong Baptist University,

More information

Quadratic forms in skew normal variates

Quadratic forms in skew normal variates J. Math. Anal. Appl. 73 (00) 558 564 www.academicpress.com Quadratic forms in skew normal variates Arjun K. Gupta a,,1 and Wen-Jang Huang b a Department of Mathematics and Statistics, Bowling Green State

More information