Applied Mathematics Research Report 07-08
|
|
- Jonah Chase
- 5 years ago
- Views:
Transcription
1 Estimate-based Goodness-of-Fit Test for Large Sparse Multinomial Distributions by Sung-Ho Kim, Heymi Choi, and Sangjin Lee Applied Mathematics Research Report 0-0 November, 00 DEPARTMENT OF MATHEMATICAL SCIENCES
2 Estimate-based Goodness-of-Fit Test for Large Sparse Multinomial Distributions Sung-Ho Kim a,, Hyemi Choi b, Sangjin Lee a a Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, 0-0, South Korea b Department of Statistical Informatics, Chonbuk National University, Chonju, -, South Korea Abstract The Pearson s chi-squared statistic (X ) does not in general follow a chi-square distribution when it is used for goodness-of-fit testing for a multinomial distribution based on sparse contingency table data. We explore properties of Zelterman s () D statistic and compare them with those of X and we also compare these two statistics and the statistic (L r ) which is proposed by Maydeu-Olivares and Joe (00) in the context of power of the goodness-of-fit testing when the given contingency table is very sparse. We show that the variance of D is not larger than the variance of X under null hypotheses where all the cell probabilities are positive, that the distribution of D becomes more skewed as the multinomial distribution becomes more asymmetric and sparse, and that, as for the L r statistic, the power of the goodness-of-fit testing depends on the models which are selected for the testing. A simulation experiment strongly recommends to use both D and L r for goodness-of-fit testing with large sparse contingency table data. Key words: Test power, P-value, Sample-cell ratio, Model discrepancy, Normal approximation, Skewness Introduction A variety of statistics have been derived for goodness-of-fit testing such as the Pearson chi-squared statistic (X ) (Pearson, 00), the log likelihood ratio statistic (G ), the Freeman-Tukey statistic, the Neyman modified X, and the modified G. Corresponding author. addresses: shkim@amath.kaist.ac.kr (Sung-Ho Kim), hchoi@chonbuk.ac.kr ( Hyemi Choi), Lee.Sang.jin@kaist.ac.kr ( Sangjin Lee). Preprint submitted to Elsevier Preprint November 00
3 Cressie and Read () proposed a family of power divergence statistics, in which all of the above statistics are embedded. All of these statistics are asymptotically chi-squared distributed with appropriate degrees of freedom (e.g. Fienberg, ; Horn, ; Lancaster, ; Moore, ; Watson, ). Two most popular statistics among them are X and G. Cochran () gives a nice review of the early development of X and concludes that there is little to distinguish it from G. The chi-squared approximation to X and G was studied by Read (), and the accuracy of these approximations was examined by Koehler and Larntz (0) and Koehler (). Koehler () demonstrated asymptotic normality of G for testing the fit of log-linear models with closed form maximum likelihood estimates in sparse contingency tables and showed that the normal approximation can be much more accurate than the chi-squared approximation for G, but he admitted that the bias of the estimated moments is a potential problem for very sparse tables. A good review on goodness-of-fit tests for sparse contingency tables is provided in Koehler (). Zelterman () proposed a statistic D and compared it with X for testing goodness-of-fit (GOF) to sparse multinomial X distributions, where (ni /e i ), D = X where n i is the count in the ith cell, e i the estimate of the cell mean of the ith cell, andp is the summation over all the cells such that e i > 0. The D statistic is not a member of the family of Cressie and Read s power divergence statistics. The goal of this paper is to explore properties of the goodness-of-fit test using D (or the D test for short), in comparison with the goodness-of-fit test using X (or the X test for short), other than the asymptotic properties as derived in Zelterman (), to do a more rigorous comparative study on the goodness-of-fit tests using the statistics, D, X, and L r, in the context of power, and to propose an approach to goodness-of-fit testing for a very sparse multinomial distribution. This paper consists of six sections. Section describes what happens to X when the data size n is smaller than the number of cells (k) of a given contingency table and summarizes earlier works on asymptotic properties of X with more attention paid to the case that n < k. Section compares D and X in the context of variance, and it is shown that the variance of D is not larger than that of X under any null hypothesis of multinomial distribution. We carry out a Monte Carlo study in Section to compare the goodness-of-fit tests by using D, X, and L r in the context of test power. The Monte Carlo result strongly suggests that the D test is at least as good as the X test when n < k and comparable with the L r test. We then investigated normal approximation to the standardized D curve in Section. Section concludes the paper with some further remarks on the goodness-of-fit testing for sparse contingency tables.
4 A BRIEF REVIEW ON GOODNESS-OF-FIT TESTS USING X AND SOME RELATED STATISTICS Suppose that we have a contingency table with k cells and n observations, and consider a goodness-of-fit testing of a symmetric null hypothesis, i.e., all the cell probabilities are the same. We will, first of all, consider the two extreme cases that n = and n =. Under the symmetric null hypothesis, X = k when n = ; and when n =, it is either k or k/ according to whether a particular cell frequency equals or not. It is interesting to note that when n < k, X is strictly positive under any hypothesis for the contingency table. This simply shows that the asymptotic distribution of X can not be a chi-square distribution when n < k. This is one of the reasons that the normal approximation was sought for in the literature when n < k. While the literature on the asymptotic properties of X abounds in regard to contingency tables with large enough data (e.g., Agresti (0); Bishop et al. ()), a unifying result is yet to be explored as for the goodness-of-fit testing for sparse contingency tables. Results from the literature concerning Pearson s statistics are summarily listed below in regard to sparse multinomial problems. (i) The X test has some optimal local power properties in the symmetrical case when the number of cells is moderately large. Hence the X test based on the traditional chi-squared approximation is preferred for the test of symmetry. For the null hypothesis of symmetry, the chi-squared approximation for X is quite adequate at the and nominal levels for expected frequencies as low as 0. when k, n 0, n /k 0 (Koehler and Larntz, 0, p. ) (ii) The chi-squared approximation for X produces inflated rejection levels for asymmetrical null hypotheses that contain many expected frequencies smaller than (Koehler and Larntz, 0, p. ). (iii) The adverse effects of small expected frequencies on the chi-squared approximation are generally less severe for X than for other commonly used goodness-of-fit statistics (Koehler,, p. ). (iv) The asymptotic power of the D test is moderate and at least as great as that of the normalized X (Zelterman,, p. ). (v) For the symmetric null hypothesis, choosing the value of the family parameter λ of Cressie and Read s () power divergence family of statistics in [, ] results in a test statistic whose size can be approximated accurately by a chisquared distribution, provided that k and n 0. When λ [, ] \ [, ], the corrected chi-square approximation F C is recommended where F C is the corrected chi-square distribution for which the mean and variance agree to second order with the mean and variance of Cressie and Read s power divergence statistics, since it gives reasonably accurate approximate levels and is easy to compute. When both n and k are large (say, n > 00), then the
5 normal approximation should be used (Read,, p. ). (vi) Berry and Mielke () proposed non-asymptotic chi-square tests based on X and D that use Pearson Type III distribution approximation for testing large sparse contingency tables. (vii) Maydeu-Olivares and Joe (00) proposed a statistic, L r, for goodness-of-fit testing using J contingency table data which is given in a quadratic form based on multivariate moments up to order r and showed that X belongs to this family of L r. They showed that when r is small (e.g., or ), the L r has better small-sample properties and are asymptotically more powerful than X for some multivariate binary models. COMPARISON OF VARIANCES The favorable comparison (Zelterman (, p. )) of D over the normalized version of X in the context of asymptotic power in goodness-of-fit tests may be extended to the situations of small or moderate data sizes. To see this, we will have a closer look at the two statistics and compare their exact means and variances across multinomial distributions, which will give us an insight into the power comparison. Suppose that the cell probability at the ith cell, p i, is equal to p 0i under the null hypothesis (H 0 ) and let it be equal to p ai under an alternative hypothesis (H a ). We will denote the cell means in the same manner, i.e., by under H 0 and by m ai under H a. Throughout the rest of the paper, we will assume thatqi p 0i > 0. We can then re-express D and X as kx X X = D = X i= (n i ) /, i n i /. We list a couple of simple facts: kx E(X X ) = E(D ) = E(X ) i= np i q i /. () i p i /p 0i. () The difference between the variance of D and that of X is given in
6 Theorem Under a multinomial assumption with the sample size n and a contingency table of k cells, we have n kx kx V (D ) = V (X ) + i= j= +n(n ) kx kx kx = V (X ) + i= i= j= p i p j m 0j p i p j m 0j p i + (n )p i kx Ž kx Ž m 0i kx i= p i i= m 0i pj j= p i Œ /k. p 0j p 0i Proof: See Appendix A. A couple of corollaries follow from this theorem. Corollary Under H 0, V (D ) is simplified as V (D ) = V (X ) + k n kx V (X ), i= where the equality holds if and only if H 0 is symmetrical. Corollary Suppose that p ah =, for some h k under an H a. Then under the H a, we have V (D ) = V (X ). () Let δ = E(D ) E(X ) and δ = V (D ) V (X ). Corollary implies that δ < 0 in a near neighborhood of (p 0,..., p 0k ) when H 0 is not symmetrical. And judging from Corollary, we can see that δ is bounded on the boundary of the k dimensional simplex of (p,..., p k ) since δ is continuous in (p,..., p k ). SIMULATION EXPERIMENTS To compare the X and D tests, we conducted simulation experiments with small samples, using a model of 0 binary variables whose model structure is represented as a directed acyclic graph (DAG). We will call this model Model 0 and denote it by M 0. Ten alternative models are considered for the goodness of fit test, which are determined by adding or deleting some edges from Model 0. They are displayed
7 in Figure along with the true model, M 0, where each vertex represents a binary variable. A brief explanation of the DAG models is given in Appendix B. For each model, sample sizes n = 0, 00, 00 are selected such that n/k is close to /0, /, /. For each of the ten alternative models, say M, we obtain an approximate value of the p-value of the goodness-of-fit test as follows: (i) Generate a sample {(n (0),..., n (0) k ) : Pk i= n (0) i = n} of size n from M 0. (ii) Estimate the cell probabilities for model M based on the sample values, n (0),, n (0) k. (iii) Generate a sample of size n from model M based on the estimates of the cell probabilities that are obtained in (ii) and compute X and D using the sample values and the estimates obtained in (ii). (iv) Repeat (iii) until t X and D values are obtained, and arrange the t X values in an increasing sequence of x (),, x (t) and similarly the D values into the sequence of d (),, d (t). (v) Compute the values of X and D using the sample values, n (0),, n (0) k, and the estimates obtained in (ii), and then obtain the values of p x (X ) and p d (X ) which are defined below in expression (). 0 Model 0 Model 0 Model 0 0 Model Model 0 Model 0 0 Model Model 0 Model 0 Model 0 Model 0 0 Fig.. Graphs of DAG Models, Models 0(M 0 ), (M ) through 0(M 0 ). Models M i, i =,,, 0, are constructed by removing (dashed line) or adding (dot-dashed line) arrows to M 0.
8 ><>: t+ if X x (t) We define a pseudo p-value of X, p x (X ), as p x (X ) = i if x t+ (i) X < x (i+), i < t. () 0. t+ if X < x () A pseudo p-value of D, p d (D ), is defined similarly as above except that X and x are replaced with D and d, respectively. As is apparent in (), the pseudo p-value gets close to the actual p-value as t increases, where the actual p-value is for the goodness-of-fit test of model M. If X x (), we have 0 p x (the actual p-value) < t +, and p x (the actual p-value) < 0. t + otherwise. The same inequalities hold for p d also. To compare the power of goodness-of-fit test between D and X, we repeated the above procedure, (i) through (v), 00 times to obtain as many values of each of p x and p d with t = 00. Table lists the proportions that p x (p d ) values are less than {, 0.} for three sample sizes(n) 0, 00, and 00. In the table, we can see that the D test is more powerful than the X test. Model M 0 is of 0 binary variables, which means a contingency table of 0 =, 0 cells. So the sample-cell ratios (n/k) considered in the table are 0/0, 00/0 = 0., and 00/0 =. The power increases as the sample size increases for both of the D and the X tests. However, it is interesting to see that the power does not change noticeably for some models with the X test. As for models M and M, the proportions that p x is not larger than 0. while those for p d increase up to and 0., respectively. We can see the same phenomenon for the two models when = 0.. We can see in the figures in Appendix C that p x is less sensitive to the sample size than p d. We will denote by π x () the proportion that p x out of the 00 p x values and similarly for π d (). From the π values of models M, M, and M, we can see that the π values do not change monotonically in the number of deleted edges from model M 0. It is indicated in the table that the π value is sensitive to how the edge deletion affects the inter-relationship among the variables in the model. For example, in model M, variables X, X, X, and X are independent of the other variables in the model, while no variable is isolated from the rest of the variables in models M and M. In Table, we can also see that the X is less sensitive, than the D test, to differences in model structure. For example, model M is smaller than model M by three
9 edges and this difference in model structure is reflected in π d (), ( =, 0.), while π x () values remain more or less the same. One may expect that any addition of edges between the set of variables, X, X, X, X and the set of the remaining variables in model M might decrease the π value. But this is not the case as the π values indicate for models M and M. We see almost no difference in the π values between the two models. It is interesting to note that connecting the two sets of variables by adding inappropriate edges which are not found in the true model M 0 does not affect the π value. The same phenomenon occurs for models M and M. The simulation result strongly recommends the D test in comparison with the X test in the sense that the D test is more sensitive to the sample size and the difference of the model structure between the true model and a selected model. Table The proportions of the p x and p d values that are less than or equal to {, 0.} out of 00 replications for the p x and p d values. The values of the proportions are rounded to the two decimals. = Models D X Add Delete Model n = 0 n = 00 n = 00 n = 0 n = 00 n = = 0. Model n = 0 n = 00 n = 00 n = 0 n = 00 n =
10 However, when the sample size is too small (e.g., n = 0) relative to the cell size (e.g., k =, 0), the D test may not be very useful unless our interested models are very different from the true model M 0 as the models M, M, and M 0. In such a small-sample situation, it is desirable that we use both of the D and the L r tests. A comparison between these tests follows below. Maydeu-Olivares and Joe (00) suggest using L r for r =,, for large and sparse J tables. From an additional simulation experiment, where we used D and L r for testing the goodness-of-fit of the set of the models in Figure, we could see that the D test is more powerful than the L r test for some of the models in the figure and vice versa for the others in the figure. When the sample size was as small as, the power of the L r test became unstable while it was not the case as for the D test. For instance, the proportion that the p-value of the goodness-of-fit test using L is less than was,, and for the models, M, M, M 0, respectively, out of 00 iterations, while they are 0.0,, and 0. with the D test. Considering the structural discrepancies of these three models from M 0 (see the first three columns in Table ), the proportions from the D test reflect the structural discrepancies better than the L test when n =. The proportions from the L test were 0 or close to zero for all the (M i, n) combinations with i = 0,,, 0, n =, 0, 00, 00 except the combinations, (M, 00) and (M, 00), for which the proportions were and 0. respectively. THE DISTRIBUTION OF THE STANDARDIZED D Although the D test is preferred, when the sample-cell ratio (n/k) is less than, to the X test in the goodness-of-fit testing due to its sensitivity to the sample size and the discrepancy between model structures, it is important to investigate the dependency of its distribution upon the sample-cell ratio and the level of asymmetry (or non-uniformity) of the cell probabilities, p,, p k, which is defined by λ = kx i= (p i k ). The first three moments of X and D are obtained under H 0 in Horn () and Mielke and Berry (), respectively. The mean and the variance of D are given under H 0, respectively, by E 0 (D ) = and V 0 (D ) = (k )( n ),
11 which have nothing to do with the distribution under H 0, while kx V 0 (X ) = (k )( n ) k n + i= is not (see the equation in Corollary.) But the skewnesses of X and D under H 0 are dependent upon the distribution under H 0 (e.g. Mielke and Berry,, p. ). According to Mielke and Berry (), the skewness of D is given by γ =s n n k +s n n {( n ) ( n )k }( k ) (k ) kx i= p i /n. () Denote the two terms in the right hand side of () by A and B, respectively. Then =Ê n O A + k k + + k kn O O + () k n and, as for B, we will consider asymmetric distributions of p i s as given by, for 0 < ɛ < and g < k, Then we have p = = p g = ɛ k and p g+ = = p k = kx i= k ɛg k(k g). () p i = g k ɛ + (k g) ɛg/k. () For the cell probabilities in (), the level of asymmetry is given by λ = g( ɛ) k(k g). () From (), (), and (), we can see that as g, g k, increases and ɛ, 0 < ɛ <, becomes smaller, both γ and λ increases. As a matter of fact, γ increases as λ increases. In particular, cê when s(s > ) is chosen so that g = k/s is an integer and o k and n satisfy k = cn for some real c, we have B = k + + c + ɛ(s ) +. (0) k ɛ(s ɛ) k Note that when ɛ =, we have p = = p k = /k. Under the distribution as 0
12 given by (), we have from () and (0) that γ =Ê k + c k + c + + c k «o + ɛ(s ) +. () ɛ(s ɛ) k The last expression implies that as we have more small p i s (that is, a smaller s (s > ) with a small ɛ value), the skewness (γ) of D gets larger. In other words, as there are more small p i s, the normal approximation to D gets worse at the tail areas. This phenomenon is observed in Tables and. It is also noteworthy in expression () that as the contingency table becomes sparser (i.e., c = k/n gets larger), the distribution of D gets more skewed. Table is obtained under symmetric multinomial distributions for k = 0,, with varying sample sizes (n) as given in the table. Under each symmetric distribution, 0,000 D values were generated and percentiles were obtained from them, and after repeating this 00 times, the means and the variances of the percentiles were obtained. Table is obtained from the same Monte Carlo experiment as for Table except that a variety of multinomial distributions were used, where each distribution is defined by the vector of the cell probabilities (p,, p k ). Here, each p i was obtained as a ratio given by u i /Pk j= u j where the random quantities, u j s, were confined to lie uniformly between and 0. to avoid too small cell probabilities. Table Three upper tail percentiles of the standardized D under symmetric multinomials. d is the upper 00 percentile of the standardized D distribution. 00 d values were generated for each pair of k and n and their mean, standard deviation (sd), and skewness measure were computed. n/k = refers to the standard normal distribution. γ is the skewness value as given by (). mean n n/k γ = 0. = = = 0. = = k = (0) k = () sd
13 Table Three upper tail percentiles of the standardized D. d is the upper 00 percentile of the standardized D distribution. 00 d values were generated for each pair of k and n and their mean, standard deviation (sd), and skewness measure were computed. n/k = refers to the standard normal distribution. mean n n/k = 0. = = = 0. = = k = (=) k = (=0) k = (=) We denote by d the upper 00 -percentile of the curve of the standardized D and by z for the standard normal curve. The d values are listed in Tables and for =,, 0.. We can see in Table that the d values appear, as expected, farther away from z as the skewness index (γ) increases unless the sample size is too small relative to the table size (k). For example, when k = 0, we see in Table that the d values change abruptly as n comes down from to ; and a similar phenomenon takes place, when k =, as n comes down from to and. In particular, d, d, and d 0. are all equal to.00 when (k, n) = (, ). It is due to an extreme sparseness of the contingency table. In the Monte Carlo experiment, all the 00 distributions that were generated under this sparseness yielded the same set of upper 0., 0., 0., 0., 0., 0., 0.,, percentile points, -0. for the first six points and.00 for the rest three. -0. corresponds to the case that the cell frequencies are at most and.00 corresponds to the case that only one cell frequency is and the others are at most. We will call such a phenomenon as this due to an extreme sparseness an hyper-sparseness phenomenon. The skewness index (γ) is not given in Table since different sets of p i s may pro- sd
14 duce different index values for fixed values of k and n. However, we can see in the table that the d values are larger on average for asymmetric multinomials than for symmetric multinomials. This phenomenon is a reflection of the expression (). We also see another hyper-sparseness phenomenon in Table at the row (k, n) = (, ). CONCLUDING REMARKS Large scale modelling is not unusual nowadays in many research fields such as bio-sciences, cognitive science, management sciences, etc. In the simulation experiments, we considered Bayesian network models which are a useful tool for representing causal relationship among variables. When the variables are not causally related, log-linear modelling is an appropriate method for contingency table data. Whether it is a Bayesian network model or a log-linear model, we can express a model in a factorized form. Suppose that a model for a set of categorical random variables, X v =Y, v V, is factorized as P (x) P A (x A ) () A V where V is a set of subsets of V. We will denote the marginal table on the subset of variables, X v, v A, for A V, by τ A and call it the configuration on A. Then we can say that the configurations, τ A, A V, are sufficient for the parameters of the model expressed in (). This means that a large sparse contingency table may not cause a trouble in parameter estimation as long as none of the configurations are sparse. In this respect, the statistic, L r, which is a function of multivariate moments of the variables X v, v V, is a reasonable one for goodness-of-fit testing when the configurations in () satisfy that A r. This is one of the reasons why the performance of the L r test is subject to the models which are selected for the goodness-of-fit testing with a given contingency table. The key idea behind the D method for goodness-of-fit testing is the same as for the goodness-of-fit testing using the Pearson chi-square statistic, X, and for the testing using the likelihood ratio statistic (G ). The latter two statistics represent a measure of distance between the data and the model which is selected by a model builder, where the measure is standardized by a chi-squared distribution. In the proposed D method, we construct a reference distribution, instead of using the chi-squared distribution, to create a distance measure which is similar to the P-value of a test, such as p x and p d in (). The simulation experiment strongly recommends to use the D test (i.e., p d values), in comparison with the X test, for goodness-of-fit testing with a large sparse contingency table due to its high sensitivity to the sample size and model discrepancy.
15 This observation leads us to recommend using both of the D and the L r tests when the contingency table is large and sparse. Not knowing the true model for a given contingency table, it is desirable to use both of the tests, since (i) the L r test seems to be more powerful than the D test when A r and the contingency table is of J type and not too sparse, (ii) the D test seems to be more powerful when A > r, and (iii) as the contingency table becomes sparser, the L r test becomes unstable while the D test remains stable. APPENDIX A: PROOF OF THEOREM The proof is a straightforward application of algebra. V (D ) = V (X ) X ni ) = V (X ) + V (X ni ) X Cov( n i, X ). () (X nx nxx ni pi q i V ) = m 0i i j nx p i p j p + i. () m 0j m 0i As for the last term in (), we have After a simple algebra, we have E(X ) = nx pi Cov( n i, X ) = E(X n i ) E(X )E( n i ). () q i + n p i n, () E(X n i p ) = n(n )X i p j p + n(n )(n m 0j )X i p j n p i m 0j i j +np i + (n )p i + (n )(n )p i m 0i i j. () Substituting equations () and () into () and then () and () into equation () yields the desired result of the theorem.
16 APPENDIX B: DAG MODEL OF CATEGORICAL VARIABLES A DAG model of a set of variables, X v, v V, is a probability model whose model structure can be represented by a directed acyclic graph, G say, which consists of vertices and directed edges or arrows between vertices. For a subset A of V, X A = (X v ) v A, and for two subsets A and B of V, we denote the conditional probability P (X B = x B X A = x A ) by P B A (x B x A ). If there is an arrow from vertex u to v, we call u a parent vertex of v. Since a DAG is acyclic, every vertex in a DAG has a unique set of parent vertices. We denote by pa(v) the set of the parent vertices of v. Then we can represent the joint probability of (X v ) v V as the product of conditional probabilities as in P (X V = x V ) =Y v V where pa(v) = if vertex v has no parent vertex. P v pa(v) (x v x pa(v) )
17 APPENDIX C: COMPARISON OF π x AND π d VALUES Solid curves with circles ( ) are for π d and dashed curves with bullets ( ) for π x. The dotted vertical line represents that =..0 Model 0: n=0.0 Model 0: n=00 π() π() Model : n=0.0 Model=: n=0 π() π() Model : n=00.0 Model=: n=00 π() π() Model : n=00.0 Model=: n=00 π() π()
18 .0 Model : n=0.0 Model : n=0 π() π() Model : n= Model : n=00 π() π() Model : n=00.0 Model : n=00 π() π() References Agresti, A., 0. Categorical Data Analysis. John Wiley & Sons: New York. Berry, K.J., Mielke, Jr. P.W.,. Monte Carlo comparisons of the asymptotic Chi-square and likelihood-ratio tests with the nonasymptotic Chi-square test for sparse r c tables. Psychological Bulletin 0(), -. Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.,. Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge. Cochran, W.G., The χ test of goodness-of-fit. Ann. Math. Statist., -. Cressie, N., Read, T.R.C.,. Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B (), 0-. Fienberg, S.E.,. The use of Chi-squared statistics for categorical data problems. J. Roy. Statist. Soc. Ser. B, -. Horn, S.D.,. Goodness-of-fit tests for discrete data: A review and an application to a health impairment scale. Biometrics, -.
19 Johnson, N.L., Kotz, S.,. Distributions in Statistics: Discrete Distributions. Houghton Mifflin Company, New York. Koehler, K.J.,. Goodness-of-fit tests for log-linear models in sparse contingency tables. J. Amer. Statist. Assoc. (), -. Koehler, K.J., Larntz, K., 0. An empirical investigation of goodness-of-fit statistics for sparse multinomials. J. Amer. Statist. Assoc. (0), -. Lancaster, H.O.,. The Chi-squared Distribution. Wiley, New York. Maydeu-Olivares, A., Joe, H., 00. Limited- and full-information estimation and goodness-of -fit testing in n contingency tables: a unified framework. J. Amer. Statist. Assoc. 00(), Mielke, Jr. P.W.,. Some exact and nonasymptotic analyses of discrete goodness-of-fit and r-way contingency tables. In: Johnson NL, Balakrishman N (Eds), Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz, Wiley, New York, -. Mielke, Jr. P.W., Berry, K.J.,. Non-asymptotic inferences based on the chisquare statistic for r by c contingency tables. J. Statist. Plann. Inference, -. Mislevy, R.J.,. Evidence and inference in educational assessment. Psychometrika (), -. Moore, D.S., Recent developments in chi-square tests for goodness-of-fit. Mimeo. (Dept. of Statistics, Purdue University, Lafayette, IN, ). Pearson, K., 00. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophy Magazine Series 0(), -. Read, T.R.C.,. Small-sample comparisons for the power divergence goodnessof-fit statistics. J. Amer. Statist. Assoc. (), -. Watson, G.S.,. Some recent results in chi-square goodness-of-fit tests. Biometrics, 0-. Wermuth, N., Lauritzen, S.L.,. Graphical and recursive models for contingency tables. Biometrika 0(), -. Zelterman, D.,. Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. (), -.
Markovian Combination of Decomposable Model Structures: MCMoSt
Markovian Combination 1/45 London Math Society Durham Symposium on Mathematical Aspects of Graphical Models. June 30 - July, 2008 Markovian Combination of Decomposable Model Structures: MCMoSt Sung-Ho
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationMARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES
REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of
More information2.1.3 The Testing Problem and Neave s Step Method
we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical
More informationAnalysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution
Journal of Computational and Applied Mathematics 216 (2008) 545 553 www.elsevier.com/locate/cam Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationSIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION
Far East Journal of Theoretical Statistics Volume 28, Number 2, 2009, Pages 57-7 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House SIMULATED POWER OF SOME DISCRETE GOODNESS-
More informationApplication of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption
Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,
More informationMeasure for No Three-Factor Interaction Model in Three-Way Contingency Tables
American Journal of Biostatistics (): 7-, 00 ISSN 948-9889 00 Science Publications Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables Kouji Yamamoto, Kyoji Hori and Sadao Tomizawa
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationPseudo-score confidence intervals for parameters in discrete statistical models
Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS
Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationMultiple Sample Categorical Data
Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationInferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data
Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone
More informationAn Approximate Test for Homogeneity of Correlated Correlation Coefficients
Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationDecomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables
International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationGeometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling
Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationCOMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS
Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department
More informationA Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions
Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson
More informationSalt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E
Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices
More informationCausal Models with Hidden Variables
Causal Models with Hidden Variables Robin J. Evans www.stats.ox.ac.uk/ evans Department of Statistics, University of Oxford Quantum Networks, Oxford August 2017 1 / 44 Correlation does not imply causation
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationA homogeneity test for spatial point patterns
A homogeneity test for spatial point patterns M.V. Alba-Fernández University of Jaén Paraje las lagunillas, s/n B3-053, 23071, Jaén, Spain mvalba@ujaen.es F. J. Ariza-López University of Jaén Paraje las
More informationA bias-correction for Cramér s V and Tschuprow s T
A bias-correction for Cramér s V and Tschuprow s T Wicher Bergsma London School of Economics and Political Science Abstract Cramér s V and Tschuprow s T are closely related nominal variable association
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationConfidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection
Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationAPPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV
DOI 10.1007/s11018-017-1213-4 Measurement Techniques, Vol. 60, No. 5, August, 2017 APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV B. Yu. Lemeshko and T.
More informationJournal of Forensic & Investigative Accounting Volume 10: Issue 2, Special Issue 2018
Detecting Fraud in Non-Benford Transaction Files William R. Fairweather* Introduction An auditor faced with the need to evaluate a number of files for potential fraud might wish to focus his or her efforts,
More informationMantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC
Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)
More informationApproximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions
Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions Lei Zhang 1, Hongmei Han 2, Dachuan Zhang 3, and William D. Johnson 2 1. Mississippi State Department of Health,
More information11-2 Multinomial Experiment
Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:
More informationGENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE
DOI 10.1007/s11018-017-1141-3 Measurement Techniques, Vol. 60, No. 1, April, 2017 GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationSpecification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data
Journal of Data Science 18(2018), 129-146 Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data Yue Fang China Europe International Business School, Shanghai,
More informationTesting for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions
Hong Kong Baptist University HKBU Institutional Repository Department of Economics Journal Articles Department of Economics 1998 Testing for a unit root in an ar(1) model using three and four moment approximations:
More informationNEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE
Communications in Statistics-Theory and Methods 33 (4) 1715-1731 NEW APPROXIMATE INFERENTIAL METODS FOR TE RELIABILITY PARAMETER IN A STRESS-STRENGT MODEL: TE NORMAL CASE uizhen Guo and K. Krishnamoorthy
More informationA Monte-Carlo study of asymptotically robust tests for correlation coefficients
Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,
More informationEstimation and sample size calculations for correlated binary error rates of biometric identification devices
Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationA simple graphical method to explore tail-dependence in stock-return pairs
A simple graphical method to explore tail-dependence in stock-return pairs Klaus Abberger, University of Konstanz, Germany Abstract: For a bivariate data set the dependence structure can not only be measured
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationMultiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale
Multiple Comparison Procedures 1 Multiple Comparison Procedures, Trimmed Means and Transformed Statistics Rhonda K. Kowalchuk Southern Illinois University Carbondale H. J. Keselman University of Manitoba
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationA sequential hypothesis test based on a generalized Azuma inequality 1
A sequential hypothesis test based on a generalized Azuma inequality 1 Daniël Reijsbergen a,2, Werner Scheinhardt b, Pieter-Tjerk de Boer b a Laboratory for Foundations of Computer Science, University
More informationThis article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationData Mining 2018 Bayesian Networks (1)
Data Mining 2018 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10
More informationAnalysis of data in square contingency tables
Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first
More informationReports of the Institute of Biostatistics
Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions
More informationTHE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS
Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION
More informationThomas J. Fisher. Research Statement. Preliminary Results
Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationDiagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations
Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations Farhat Iqbal Department of Statistics, University of Balochistan Quetta-Pakistan farhatiqb@gmail.com Abstract In this paper
More informationTUTORIAL 8 SOLUTIONS #
TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationThe In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27
The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell Dennis Boos Department of Statistics University of Florida and Department of Statistics North Carolina State
More informationPAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator FOR THE BINOMIAL DISTRIBUTION
APPLICATIONES MATHEMATICAE 22,3 (1994), pp. 331 337 W. KÜHNE (Dresden), P. NEUMANN (Dresden), D. STOYAN (Freiberg) and H. STOYAN (Freiberg) PAIRS OF SUCCESSES IN BERNOULLI TRIALS AND A NEW n-estimator
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationApplying the Benjamini Hochberg procedure to a set of generalized p-values
U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationCPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary
CPSC 531: Random Numbers Jonathan Hudson Department of Computer Science University of Calgary http://www.ucalgary.ca/~hudsonj/531f17 Introduction In simulations, we generate random values for variables
More informationsubroutine is demonstrated and its accuracy examined, and the nal Section is for concluding discussions The rst of two distribution functions given in
COMPUTATION OF THE TEST STATISTIC AND THE NULL DISTRIBUTION IN THE MULTIVARIATE ANALOGUE OF THE ONE-SIDED TEST Yoshiro Yamamoto, Akio Kud^o y and Katsumi Ujiie z ABSTRACT The multivariate analogue of the
More informationADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ
ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationMinimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview
Symmetry 010,, 1108-110; doi:10.3390/sym01108 OPEN ACCESS symmetry ISSN 073-8994 www.mdpi.com/journal/symmetry Review Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency
More informationChi-Square. Heibatollah Baghi, and Mastee Badii
1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationBootstrap Procedures for Testing Homogeneity Hypotheses
Journal of Statistical Theory and Applications Volume 11, Number 2, 2012, pp. 183-195 ISSN 1538-7887 Bootstrap Procedures for Testing Homogeneity Hypotheses Bimal Sinha 1, Arvind Shah 2, Dihua Xu 1, Jianxin
More informationTest Volume 11, Number 1. June 2002
Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa
More informationLing 289 Contingency Table Statistics
Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationEstimation of parametric functions in Downton s bivariate exponential distribution
Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr
More informationStructure learning in human causal induction
Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationTesting Some Covariance Structures under a Growth Curve Model in High Dimension
Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping
More informationLikelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern
The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 473-481 Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern Byungjin
More informationHow likely is Simpson s paradox in path models?
How likely is Simpson s paradox in path models? Ned Kock Full reference: Kock, N. (2015). How likely is Simpson s paradox in path models? International Journal of e- Collaboration, 11(1), 1-7. Abstract
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationResearch on Standard Errors of Equating Differences
Research Report Research on Standard Errors of Equating Differences Tim Moses Wenmin Zhang November 2010 ETS RR-10-25 Listening. Learning. Leading. Research on Standard Errors of Equating Differences Tim
More informationGeneralized Cramér von Mises goodness-of-fit tests for multivariate distributions
Hong Kong Baptist University HKBU Institutional Repository HKBU Staff Publication 009 Generalized Cramér von Mises goodness-of-fit tests for multivariate distributions Sung Nok Chiu Hong Kong Baptist University,
More informationQuadratic forms in skew normal variates
J. Math. Anal. Appl. 73 (00) 558 564 www.academicpress.com Quadratic forms in skew normal variates Arjun K. Gupta a,,1 and Wen-Jang Huang b a Department of Mathematics and Statistics, Bowling Green State
More information