Partition of the Chi-Squared Statistic in a Contingency Table

Size: px

Start display at page:

Download "Partition of the Chi-Squared Statistic in a Contingency Table"

Margery Anthony
6 years ago
Views:

1 Partition of the Chi-Squared Statistic in a Contingency Table by Jo Ann Colas Thesis submitted submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For the MSc degree in Mathematics and Statistics Department of Mathematics and Statistics Faculty of Sciences University of Ottawa c Jo Ann Colas, Ottawa, Canada, 014

2 Abstract The Pearson statistic, a well-known goodness-of fit test in the analysis of contingency tables, gives little guidance as to why a null hypothesis is rejected One approach to determine the source(s) of deviation from the null is the decomposition of a chi-squared statistic This allows writing the statistic as the sum of independent chi-squared statistics First, three major types of contingency tables and the usual chi-squared tests are reviewed Three types of decompositions are presented and applied: one based on the partition of the contingency table into independent subtables; one derived from smooth models and one from the eigendecomposition of the central matrix defining the statistics A comparison of some of the omnibus statistics decomposed above to a χ (1)-distributed statistic shows that the omnibus statistics lack power compared to this statistic for testing hypothesis of equal success probabilities against monotonic trend in the success probabilities in a column-binomial contingency table ii

3 Acknowledgements Foremost, I would like to express my sincere gratitude to my supervisors Dr M Alvo and Dr P-J Bergeron for their continuous support of my Master s study and research, for their patience, motivation and knowledge Their guidance helped me in all the time of researching and writing this thesis I would also like to thank my committee members, Dr M Zarepour and Dr S Sinha, for letting my defense be an enjoyable moment, and for your brilliant comments and suggestions I thank my fellow labmates at the Department of Mathematics and Statistics: Maryam, Blanche Nadege, Nada, Cheikh, Farid, Hicham, Rachid, Ewa, Farnoosh, Jack and Gael for the stimulating discussions, for the sleepless nights we worked together before deadlines and for all the fun we have had in the last two years Last but not least, I would like to thank my family: my mother Rosie as my editor, my father as my chauffeur and my brother as my motivator iii

4 Contents 1 Introduction 1 Description of the Chi-Squared Test for a Contingency Table 3 1 The Chi-Squared Distribution 3 Contingency Tables 5 3 The Three Models Associated to Contingency Tables 6 31 The unrestricted bivariate sampling model 6 3 The product-multinomial model 8 33 The permutation model The link between the three models 11 4 Chi-Squared Hypothesis Tests 1 41 The chi-squared goodness-of-fit test 13 4 The chi-squared test for homogeneity The chi-squared test for independence 19 3 Tools for the Decomposition of the Pearson Statistic 3 31 Ordinal Categories and Testing for Trends 3 3 Ranked Data and Contingency Tables 5 31 Ranked data and their properties 5 3 Advantages and disadvantages of working with ranked data 6 33 Smooth Goodness-of-Fit Models Neyman-type smooth models 8 33 Barton-type smooth models Advantages and disadvantages of smooth models Orthonormal Polynomials Defined From Distributional Moments 30 4 Decompositions of Some Chi-Squared Statistics The Decomposition Into Subtables 33 4 The Decomposition of the Pearson Statistic Decomposition of the Pearson statistic in Two-Way Tables 37 4 Under the unrestricted bivariate sampling model Under the multinomial model 44 iv

5 44 Under the binomial model 5 45 Under the randomised design model Decomposition of the Pearson Statistic in Three-Way Tables The Decomposition of the Cumulative Chi-Squared Statistic General form of the decomposition of the cumulative chi-squared statistic 6 43 The distribution of the cumulative chi-squared statistic Link to smooth models Explicit solutions when the column are equiprobables The Decomposition of the Chi-Squared Likelihood-Ratio Test Statistic Log-linear models Decomposition of the chi-squared likelihood-ratio test statistic Extension of the decomposition 76 5 Comparisons of the CCS Statistics and a Correlation Based Statistic The Alvo & Berthelot Spearman Test 78 5 The Comparisons The distributions under the null hypothesis 80 5 Comparison of the significance levels Comparison of the powers 84 6 Conclusion 89 A Obtaining the first two orthogonal polynomials from the Emerson recursion 91 B Proof of symmetry of (AA ) 1 when the columns are equiprobable 96 v

6 List of Tables 1 An I J contingency table 5 An I J contingency table under the product-multinomial model 8 3 An I J contingency table under the permutation model The s-th subtable 34 4 A rank-by-rank table (without ties) A treatment-by-rank table (without ties) The I table around the overall sample median A treatment-by-rank table (without ties) The s-th collapsed I contingency table 6 47 The J collapsed table around the i-th row Hierarchical log-linear models for an I J contingency table Hierarchical log-linear models for an I J K contingency table Minimum number of simulations 83 5 Significance levels for the Taguchi and Nair CCS statistics Significance levels for the Pearson and Spearman statistics Comparison of the powers strictly increasing success probabilities Comparison of the powers increasing success probabilities with repeated values Comparison of the powers non-increasing and non-decreasing success probabilities 88 vi

7 Chapter 1 Introduction Karl Pearson s chi-square goodness-of-fit test was introduced at the end of the development of several important concepts in the preceding centuries In 1733, de Moivre[19] established the asymptotic normality of the variable X = Y nπ nπ(1 π), where Y Binomial(n,π) Consequently, X is asymptotically distributed as the square of the N(0, 1) distribution Bienaymé[1] published the distribution of the sum of m independently distributed N(0, 1) variables in the gamma function form Bravais[15], Schols[4] and Edgeworth[0] developed the joint multivariate normal distribution A contemporary of Pearson, Sheppard[43, 44] considered possible tests of goodness-of-fit by comparing observed frequencies to expected frequencies for each cell of a contingency table In a good fit, the difference would be small In particular, Sheppard looked at tables as a dichotomy of a bivariate normal distribution However, he could not find a generalization for I J tables due to the awkward form of the covariance matrix By considering a multinomial distribution instead, Pearson[3] found a more tractable solution of the covariance matrix and provided the widely used chi-squared test of goodness-of-fit The advantages of the Pearson test include: easy to compute; used on categorical variables; used to compare two or more populations/samples; no assumption on the distribution of the population(s) However, it has some shortcomings Since it is based on categorical variables, there will be some loss of information if it is used with samples based on continuous distributions Also, since the distribution of the test statistic is obtained asymptotically, the Pearson test is sensitive to sample size In this thesis, we focus on the fact that the Pearson test is an omnibus test When it rejects the null hypothesis, it tells there is enough evidence to suggest a relationship, but it is not clear about the strength or the type of this relationship One approach to overcome this drawback is to use the additive property of the chi-squared distribution that allows writing a chi-squared variable as the sum of asymptotically independent chi-squared variables Such a breakdown of a chi-squared statistic is called a partition or a decomposition of the statistic Many decompositions of the Pearson statistic have been proposed, allowing for the detection of different deviations from 1

8 Introduction the null distribution Agresti[1] and Iversen[5] present a decomposition of the Pearson statistic based on the decomposition of the I J contingency table into subtables If the data is in the form of ranks, Rayner & Best provide a partition of the Pearson statistic based on a set of polynomials orthonormal with respect to distribution Depending on the model of the contingency table, this approach results in the extension of widely-used tests on two-way contingency table, such as the Pearson product-moment correlation coefficient[36], the Kruskal-Wallis test[37] and the Friedman test[16] If the alternative hypothesis is directional, other chi-squared test statistics have been proposed and decomposed, in particular, the class of cumulative chi-squared tests based on the weighted sum of increasing sums of squares initially proposed by Taguchi[45, 46] If the data is binary and the alternative is that of monotone trend in the proportions, Alvo-Berthelot[] proposed a test statistic motivated by rankings This thesis reviews the above-mentioned decompositions In Chapter, we first review the types of contingency tables and then go over the usual chisquared test based on the Pearson statistic In Chapter 3, we review the necessary tools for the decompositions, including properties of ranked data, a brief review of smooth models and the the derivation of polynomials orthonormal with respect to a distribution Chapter 4 discusses the different decompositions Finally, in Chapter 5 we study the power simulations of the Pearson test, the cumulative chi-squared tests and the Spearman statistic from Alvo-Berthelot[] to show that when we want to test homogeneity against monotonicity in one parameter, alternative and simpler tests are more powerful

9 Chapter Description of the Chi-Squared Test for a Contingency Table In this chapter, we first define the chi-squared distribution and describe some of its properties We follow with describing the three major types of contingency tables and giving the distribution of the cell counts We end the chapter with a review of the usual chi-squared hypothesis tests and the associated Pearson and likelihood ratio test statistics 1 The Chi-Squared Distribution Definition 1 A random variable W has a chi-squared distribution with ν degrees of freedom, denoted by W χ (ν), if its density function is given by f (w ν) = 1 Γ ( v ) v w v e w, 0 w <, ν > 0 From the definition of the density function for the chi-squared distribution, we can see that this distribution is in fact a special case of the gamma distribution The mean, variance and characteristic function are respectively given by E[W] = ν, Var[W] = ν, φ W (t) = 1 (1 it) v Theorem 1 If W 1 and W are two independent variables such that W 1 χ (ν 1 ) and W χ (ν ), then the random variable W 1 +W χ (ν 1 +ν ) Conversely, if a random variable W χ (ν), W can be always be expressed as the sum of two independent random variables W 1 and W such that W 1 χ (ν 1 ), W χ (ν ) and ν = ν 1 +ν Proof Let φ Wk (t) = (1 it) ν k/ be thecharacteristicfunction forw k, k = 1, SinceW 1 andw areindependent, thecharacteristicfunction 3

10 Description of the Chi-Squared Test For A Contingency Table 4 of W 1 +W is φ W1 W (t) = φ W1 (t)φ W (t) 1 1 = (1 it) v 1 (1 it) v = 1 (1 it) v 1 +v which is the the characteristic function for the χ (ν 1 ν ) distribution Remark By recursion, Theorem 1 is true for any finite number independent chi-squared variables such that the sum of their degrees of freedom is ν Lemma 3 Let Z be a standard normal variable Then, Z has a χ (1) distribution Proof Let W = Z Then, dw = zdz Then, for all w (0, ), the cdf for W is F W (w) = P (W w) = P ( Z w ) = P ( w 1/ Z +w 1/) ( = Φ +w 1/) Φ ( w 1/) where Φ is the cdf for the N(0,1) distribution which has pdf φ(z) = e z / / π for z + Then, the pdf for W = Z is f W (w) = d dw F W (w) dz dw = 1 [ ( φ +w 1/) ( +φ w 1/)] w 1/ = 1 e w πw = 1 Γ ( 1 ) 1 w 1 e w which is the density function for the χ (1) distribution Theorem 4 Let Z 1,,Z ν be ν independent and identically distributed (iid) variables with a N (0,1) distribution Then, W = Z 1 + +Z ν χ (ν) Proof By Lemma 3, the Z 1,,Z ν are iid with a χ (1) distribution and so by Remark, W = Z 1 + +Z ν χ (ν) Alternatively, some authors define a chi-squared variable as the sum of ν independent squared standard normal variables and then show that it is in fact a special case of a gamma variable

11 Description of the Chi-Squared Test For A Contingency Table 5 Contingency Tables We recall the definition of a categorical variable, also known as a nominal variable Definition A categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values which are defined in terms of categories A categorical variable can be represented on either a nominal or ordinal scale For example, the variable colour has nominal scale where the categories red, orange, yellow, green, blue, indigo and violet have no particular order On the other hand, the variable work qualification, with categories qualified, semi-qualified and not qualified, has an ordinal scale, where there is a decreasing level of qualification from the first to the last category Let X be a categorical variable with classification { A = A 1,,A I : A i A j =, i j = 1,,I, Ai = Ξ} where Ξ is the set of possible values for X Similarly, let Y be a categorical variable with classification { B = B 1,,B J : B i B j =, i j = 1,,J, Bj = Ψ} where Ψ is the set of possible values for Y Suppose (X 1,Y 1 ),,(X n,y n ) is a random sample of the random vector (X,Y) Let N ij denote the random variable which counts the number of observations that fall into the cross-category A i B j That is, n N ij = I[X k A i,y k B j ], i = 1,,I, j = 1,,J k=1 where I[ ] is the indicator function, defined as { 1 if the event A occurs, I[A] = 0 otherwise Table 1 represents an I J contingency table for X and Y where Table 1: An I J contingency table B 1 B j B J Total A 1 N 11 N 1j N 1J N 1 A i N i1 N ij N ij N i A I N I1 N Ij N IJ N I Total N 1 N j N J n N ij represents the counts for the A i B j cross-category N i, called the i-th row total, represents the counts for the category A i n N i = 1[X k A i ] = N i1 + +N ij i = 1,,I; k=1

12 Description of the Chi-Squared Test For A Contingency Table 6 N j, called the j-th column total, represents the counts for the category B j N j = n 1[Y k B j ] = N 1j + +N Ij j = 1,,J; k=1 n represents the total number of observations, which is always fixed and known Parameters of interest in an I J contingency table are the cell and marginal probabilities: π ij denotes the probability that (X,Y) falls into the (i,j)-th cross-category π ij = P (X A i,y B j ), i = 1,,I, j = 1,,J π i denotes the probability that X is in category A i : π i = P (X A i ), i = 1,,I Since we have P (X A i ) = P (X A i,y Ψ) = where the B j are disjoint, we have that P (X A i,y B j ) π i = π i1 + +π ij, i = 1,,I π j denotes the probability that Y is in category B j : π j = P (Y B j ), j = 1,,J Since we have P (Y B j ) = P (X Ξ,Y B j ) = where the A i are disjoint, we have that P (X A i,y B j ) π j = π 1j + +π Ij, j = 1,,J Here we have presented two-way contingency tables but this concept can be extended to multi-way contingency tables which cross-classify three or more categorical variables 3 The Three Models Associated to Contingency Tables As Lancaster[7] indicates, there are three probability models that allow for analysis on contingency tables In the models, while n is always fixed, the sets of marginal totals may or may not be fixed 31 The unrestricted bivariate sampling model Suppose X and Y are categorical variables and the random vector (X,Y) has a joint distribution P (X A i,y B j ) = π ij, i = 1,,I, j = 1,,J such that π π IJ = 1, where A i, and B j, are the categories for X and Y, respectively,i = 1,,I, j = 1,,J Let us assume n observations of (X,Y) are chosen by a random process with assigned weight π ij Let N ij denote the number of observations in the (i,j)-th cross-category The data can be arranged as in Table 1

13 Description of the Chi-Squared Test For A Contingency Table 7 The probability for the cell counts N ij of the I J contingency table for X and Y is given by the multinomial distribution ( ) I n J P ({N ij = n ij } π,n) = n 11,,n IJ where π = (π 11,,π IJ ) The log-likelihood function for π is l(π {n ij },n) n ij lnπ ij Under the constraint π π IJ = 1, the Lagrange function is L({n ij },λ π,n) n ij lnπ ij λ π ij 1 To find the ML estimator for π ij, we maximise the Lagrange function: π ij L({n ij },λ π,n) = 0 n ij π ij λ = 0 π ij = n ij λ π nij ij (1) Substitutingintotheconstraintyields(n n IJ )/λ = 1, soλ = nandtheunrestrictedmlestimators for the probabilities π ij are ˆπ ij = N ij n i = 1,,I, j = 1,,J Given the cell counts have a multinomial distribution, the unrestricted ML estimators for the means are ˆµ ij = nˆπ ij, i = 1,,I, j = 1,,J, and the unrestricted ML estimators for the covariances are { ˆπij (1 ˆπ ij ), i = a, j = b, cov[n ˆ ij,n ab ] = n ˆπ ijˆπ ab, otherwise, where R is the IJ IJ matrix given by = nˆr R = [diag(π) (ππ )] In this model, the only fixed quantity is the total number of observations Example 1 Suppose we observe n = 50 individuals We cross-classify each individual with respect to their sex (male or female) and their handedness (right-, left-handed or ambidextrous) Then, the 3 contingency table for the variables sex and handedness has an unrestricted bivariate sampling model

14 Description of the Chi-Squared Test For A Contingency Table 8 3 The product-multinomial model Lancaster[7] presents this as the comparative trial model however it is now more commonly known as the product-multinomial model Let X 1,,X I be random variables defined on the same outcome space Ξ which has the partition { B = B 1,,B J : B i B j =, i j = 1,,J, Bj = Ξ} Suppose X i1,,x ini is a random sample for X i, i = 1,,I Let N ij denote the number of observations of the i-th sample which are in the j-th category That is n i N ij = 1[X ik B j ], i = 1,,I, j = 1,,J k=1 Then the I J contingency table for the variables X 1,,X I against the classification B is as Table Table : An I J contingency table under the product-multinomial model B 1 B j B J Total X 1 N 11 N 1j N 1J n 1 X i N i1 N ij N ij n i X I N I1 N Ij N IJ n I Total N 1 N j N J n In this model, the rows are independent We denote the row totals as n i, to emphasize that they are known and fixed n i = N i1 + +N ij = n i, i = 1,,I and the product-multinomial model is called the row-multinomial model Let π ij denote the probability that an observation of the i-th variable falls into the j-th category That is, P (X i B j ) = π ij, i = 1,,I, j = 1,,J Since B partitions Ξ, the probabilities for each row sum to one π i1 + +π ij = 1, i = 1,,I Then, assuming that each observation is independent of every other observation, the i-th row N i has a multinomial(n i,π i ) distribution, where π i = (π i1,,π ij ), i = 1,,I, and the probability for the cell counts is given by the product of I multinomial distributions I ( ) P ({N ij = n ij } π,{n i },n) = n i J π nij ij () n i1,,n ij

15 Description of the Chi-Squared Test For A Contingency Table 9 where π = (π 1,,π I ) The log-likelihood function for π is l(π {n ij },{n i },n) n ij lnπ ij Under the constraints π i1 + +π ij = 1, i = 1,,I, the Lagrange function is L({n ij },λ π,n) n ij lnπ ij π ij 1 To find the ML estimator for π ij, we maximise the Lagrange function: λ i π ij L({n ij },λ π,n) = 0 π ij = n ij λ i Substituting into the i-th constraint yields (n i1 + + n ij )/λ i = 1, so λ i = n i, i = 1,,I, and the unrestricted ML estimator for the probabilities π ij are ˆπ ij = N ij n i i = 1,,I, j = 1,,J Given that the rows have independent multinomial distributions, the unrestricted ML estimators for the means are ˆµ ij = n i ˆπ ij, i = 1,,I, j = 1,,J, and the unrestricted ML estimators for the covariances are ˆπ ij (1 ˆπ ij ), i = a, j = b, cov[n ˆ ij,n ab ] = n i ˆπ ijˆπ ib, i = a, j b, 0, i a, where R i is the J J matrix given by = n i { ˆR i i = a, 0, i a R i = [diag(π i ) (π i π i )] i = 1,,I (3) The same way the total number of observations in the rows were fixed, the total number of observations in the columns could be fixed instead, which would be denoted by n j, j = 1,,J Then, the j-th column N j has a multinomial(n j,π j ) distribution, the probability for the cell counts would be given by [ J ( ) I ] n j P ({N ij = n ij } π,{n j },n) = π nij ij n 1j,,n Ij and the product-multinomial model is now called a column-multinomial model Example Suppose we sample n i = 50 individuals from I = 5 countries For each country, we classify each individual with respect to their socio-economic class (upper, middle or lower class) Then, the 5 3 contingency table for the variables country and socio-economic class has a product-multinomial model

16 Description of the Chi-Squared Test For A Contingency Table The permutation model Suppose we have n independent observations which can be categorised according to one of two possible classifications { A = A 1,,A I : A i A j =, i j = 1,,I, Ai = Ξ} and B = { B 1,,B J : B i B j =, i j = 1,,J, Bj = Ξ} where Ξ is the set of possible values for the observations Let n i, i = 1,,I, and n j, j = 1,,J, be known and fixed positive non-zero integers such that I n i = J n j = n From the n observations, select n 1 without replacement and assign them to A 1, then select n without replacement and assign them to A, and so on Now, we restart the process: from these same n observations select n 1 without replacement and assign them to B 1 then select n without replacement and assign them to B, and so on This way, each object now has two labels The I J contingency table for this cross-classification is as Table 3 Table 3: An I J contingency table under the permutation model B 1 B j B J Total A 1 N 11 N 1j N 1J n 1 A i N i1 N ij N ij n i A I N I1 N Ij N IJ n I Total n 1 n j n J n The row and column totals as are denoted by n i and n j, respectively, to emphasize that they are fixed: n i = N i1 + +N ij, n j = N 1j + +N Ij, i = 1,,I, j = 1,,J In this model, the fixed quantities are the total number of observations and both sets of marginal totals The number of ways of obtaining cell counts for the i-th row is ( n i n i1,,n ij ) = n i! J n ij! and so the number of ways obtaining an I J contingency table with entries {n ij } is I ( n i n i1,,n ij The number of ways of obtaining the column totals is ( n n 1,,n J ) I = n i! I J n ij! ) = n! J n j!

17 Description of the Chi-Squared Test For A Contingency Table 11 So, the probability for the cell counts is given by P ({N ij = n ij } {n i },{n j },n) = = ( 1 n n 1,,n J ( I n i! ) I ( )( J n j! n! I J n ij! ) n i n i1,,n ij ) (4) As per Lemma 51 of Lancaster s The Chi-Squared Distribution, the means for the cell counts are µ ij = nπ i π j where, for the permutation model, the row and column probabilities are fixed at π i = n i /n and π j = n j /n respectively, i = 1,,I, j = 1,,J Continuing from Lemma 51, the covariances are given by cov[n ij,n ab ] = n n 1 = n n 1 = n n 1 S T π i (1 π i )π j (1 π j ), i = a, j = b, π i (1 π i )π j π b, i = a, j b, π i π a π j (1 π j ), i a, j = b, π i π a π j π b, i a, j b, { πi (1 π i )T, i = a, π i π a T, i a, where S and T are respectively the I I and J J matrices given by S = [diag(π i ) (π i π a )] (5) T = [diag(π j ) (π j π b )] (6) Example 3 Suppose we have two variables, sex and socio-economic class We choose 150 males and 100 females such that we have 50 individuals classified as upper class, 75 middle class and 15 lower class Then, the 3 contingency table for the variables sex and socio-economic class has a permutation model 34 The link between the three models While the cell counts have different distributions under the three models, the following theorem links the three models Theorem 5 Under the hypothesis of independence of the marginal variables, the distribution of the entries in a two-way contingency table does not involve the marginal parameters, {π i } and {π j } in the unrestricted bivariate sampling model Under the hypothesis of homogeneity, π ij = π j, the distribution of the entries in a two-way contingency table does not involve the marginal parameters {π j } in the product-multinomial model Proof For the unrestricted bivariate sampling model, let us condition on both sets of marginal totals

18 Description of the Chi-Squared Test For A Contingency Table 1 P ({N ij = n ij } π,n) P ({N ij = n ij } π,{n i = n i },{N j = n j },n) = P ({N i = n i } π,n)p ({N j = n j } π,n) ( ) n I n 11,,n IJ = [( ) ][( n I n πni i n 1,,n I n 1,,n J ( ) n n 11,,n IJ = ( )( ) n n n 1,,n I n 1,,n J [ I ][ n J ] i! n j! I J [ ] nij P ({N ij = n ij } π,{n i = n i },{N j = n j },n) = n! πij I J n ij! π i π j J πnij ij ) ] J πn j j I J πnij ij [ I ][ J πni πn j i j Under the hypothesis of independence, π ij = π i π j, and so the probability is now equal to (4), which does not involve the marginal parameters {π i } and {π j } For the row-multinomial model, let us condition on the column totals P ({N ij = n ij } π,{n i },{N j = n j },n) = P ({N ij = n ij } π,{n i },n) P ({N j = n j } π,n) [( = = I ( I ( n i n i1,,n ij n n 1,,n J ( n i n i1,,n ij ) n n 1,,n J [ I ][ n J ] i! n j! P ({N ij = n ij } π,{n i },{N j = n j },n) = n! I J n ij! ) J πnij ) J πn j j ) ] I J πnij [ J ] πn j j I J [ ] nij πij Under the hypothesis of homogeneity, π ij = π j, and so the probability is now equal to (4) which does not involve the column parameters {π j } This theorem, presented by Lancaster[7], tell us that under the appropriate hypothesis and given both sets of marginal totals, the cell counts under the three models have the same conditional probability, ( I )( n J ) i! n j! P ({N ij = n ij } {N i = n i },{N j = n j },n) = n! I J n (7) ij! In fact, if we condition on both sets of marginal probabilities, we have the same result 4 Chi-Squared Hypothesis Tests We consider test statistics which asymptotically have a χ distribution under the null hypothesis π j ]

19 Description of the Chi-Squared Test For A Contingency Table The chi-squared goodness-of-fit test A goodness-of-fit test is used to decide if a sample comes from a specified distribution This is done by verifying if the empirical distribution and the specified distribution give the same probabilities for a fixed number of sets The most widely used chi-squared test for goodness-of-fit, was introduced by Karl Pearson[3] in 1900 Let X be a random variable, defined on the outcome space Ξ, with unknown cumulative distribution function F Let X 1,,X n be a random sample of X The null hypothesis for a chi-squared test of goodness-of-fit is H 0 : F (x) = F 0 (x) for all x R, (8) where F 0 is a completely specified distribution The data is grouped in the following way: let { B = B 1,,B J : B i B j =, i j = 1,,J, Bj = Ξ} be a pre-specified partition of Ξ Let N j denote the random variable that counts the number of observations falling in the j-th category B j : N j = n I[X k B j ], j = 1,,J k=1 Then we have the following 1 J contingency table B 1 B j B J Total X N 1 N j N J n Total N 1 N j N J n This contingency table falls under the row-multinomial model with I = 1, cell counts as N 1j = N j, cell probabilities π 1j = π j = P (X B j ) which satisfy π 1 + +π J = 1, and a fixed (row) total of n Now, the null hypothesis for a chi-squared test of goodness-of-fit is equivalent to H 0 : π j = π j0, j = 1,,J (9) where π j0 is the probability of the j-th cell under the distribution F 0 Since the hypothesis H 0 is more restrictive than the hypothesis H 0, if the hypothesis H 0 is rejected then it is natural to reject the hypothesis H 0 If the hypothesis H 0 is not rejected, then we can say that the grouped data do not contradict the hypothesis H 0 The Pearson chi-squared statistic Given that the above 1 J contingency table falls under the row-multinomial model, the random vector of the cell counts N = (N 1,,N J ) has a multinomial(n,π) distribution where π = (π 1,,π J ) Pearson suggested the following, the Pearson chi-squared statistic, as a statistic to test against the alternative hypothesis H 1 : π j π j0 X P = (N j ˆµ j0 ) ˆµ j0,

20 Description of the Chi-Squared Test For A Contingency Table 14 where ˆµ j0 is the ML estimator for µ j = nπ j under H 0 : π j = π j0, j = 1,,J This statistic looks at the difference between the observed counts and their expected value under H 0 The expected value of N j under H 0 is ˆµ j0 = nπ j0 Thus, the Pearson statistic for goodness-of-fit is X P = = = (N j nπ j0 ) nπ j0 Nj nπ j0n j +(nπ j0 ) X P = Theorem 6 Under the null hypothesis N j nπ j0 N j nπ j0 n nπ j0 N j +n H 0 : π j = π j0, j = 1,,J, the Pearson statistic X P asymptotically has a χ (J 1) distribution as n Proof The proof is presented in Kendall & Stuart[6] π j0 The likelihood ratio test statistic An alternative statistic for testing the above hypotheses has been derived by looking at this problem as a two-sided likelihood ratio test (LRT) From subsection 3, the likelihood function for π is ( ) n J L(π {n j },n) = n 1,,n J and the unrestricted ML estimators for the probabilities π j are ˆπ j = N j n j = 1,,J The ML estimators for the probabilities π j under H 0 are ˆπ j0 = π j0, j = 1,,J Therefore, the two-sided LRT statistic is ( ) n J Λ(N,n) = L(ˆπ j0 N,n) L(ˆπ j N,n) = N 1,,N πnj j0 J J ( ) Nj nπj0 ( ) n J ( ) = Nj Nj N j N 1,,N n J Wilks[49] showed that if the null hypothesis is true, lnλ has a chi-squared distribution as n The degrees of freedom are ν = dimθ dimθ 0, where Θ is the parameter space and Θ 0 is the parameter space under the null hypothesis The alternative statistic, called the chi-squared likelihood ratio test statistic, is G = lnλ π nj j

21 Description of the Chi-Squared Test For A Contingency Table 15 So, the chi-squared LRT statistic for goodness-of-fit is The parameter space is G = lnλ = ( ) Nj N j ln nπ j0 Θ = { (π 1,,π J ) [0,1] J : π 1 + +π J = 1 } So dimθ = J 1 because of the relation π j = 1 Under H 0 : π j = π j0, the probabilities are fixed so dimθ 0 = 0 and the chi-squared LRT statistic G asymptotically has a χ (J 1) as n Comparison of the likelihood ratio test statistic and Pearson chi-squared statistic When the null hypothesis of goodness-of-fit is true, both the Pearson statistic and the chi-squared LRT statistic asymptotically have a χ (J 1) Theorem 7 Under the same null hypothesis, the statistics X P and G are asymptotically equivalent Proof Let ˆµ j0 be the ML estimator of µ j = E(N j ) under the null hypothesis Then, the chi-squared LRT statistic is G = = = ( ) Nj N j ln ˆµ j0 ) (ˆµj0 +N j ˆµ j0 (ˆµ j0 +N j ˆµ j0 )ln Let g(x) = ln(1+x) Then, the k-th derivative for g is The Taylor expansion for g around a constant c is ˆµ j0 ( (ˆµ j0 +N j ˆµ j0 )ln 1+ N ) j ˆµ j0 ˆµ j0 g (k) (x) = ( 1) k 1 (k 1)! (1+x) k k = 1,, T g (x,c) = g (i) (c) (x c)i i! = k ( 1) i 1 ( ) i x c i 1+c For c = 0, T g (x,0) = ( 1) i 1 x i = x x i + x3 3 x4 4 +

22 Description of the Chi-Squared Test For A Contingency Table 16 Then the Taylor expansion around c = 0 for the j-th term of G is ( ) (Nj ˆµ j0 ) (ˆµ j0 +N j ˆµ j0 )T g,0 = (ˆµ j0 +N j ˆµ j0 ) ˆµ j0 = ˆµ j0 [ (N j ˆµ j0 ) [ (N j ˆµ j0 ) 1 (N j ˆµ j0 ) ˆµ j0 ˆµ + 1 (N j ˆµ j0 ) 3 j0 3 ˆµ 3 j0 ] 1 (N j ˆµ j0 ) ˆµ j0 ˆµ + 1 (N j ˆµ j0 ) 3 j0 3 ˆµ 3 j0 [ ] (N j ˆµ j0 ) +(N j ˆµ j0 ) 1 (N j ˆµ j0 ) ˆµ j0 ˆµ + 1 (N j ˆµ j0 ) 3 j0 3 ˆµ 3 j0 = (N j ˆµ j0 ) 1 (N j ˆµ j0 ) + 1 (N j ˆµ j0 ) 3 ˆµ j0 3 ˆµ j0 + (N j ˆµ j0 ) 1 (N j ˆµ j0 ) 3 ˆµ j0 ˆµ + 1 (N j ˆµ j0 ) 4 j0 3 ˆµ 3 j0 = (N j ˆµ j0 )+ 1 (N j ˆµ j0 ) 1 (N j ˆµ j0 ) 3 ˆµ j0 6 ˆµ j0 = (N j ˆµ j0 )+ 1 (N j ˆµ j0 ) +O [(N j ˆµ j0 ) 3] ˆµ j0 Since under the null hypothesis J (N j ˆµ j0 ) = 0, as n, G (N j ˆµ j0 ) ˆµ j0 = X P ] Since the proof is shown without the use of explicit form of ML estimator ˆµ j0, X P and G are always asymptotically equivalent under the same null hypothesis 4 The chi-squared test for homogeneity The objective of the chi-squared test for homogeneity is to decide if two or more samples come from the same population by determining if they give the same probabilities for a fixed number of sets Let X 1,,X I be random variables, defined on the same outcome space Ξ which has the partition { B = B 1,,B J : B i B j =, i j = 1,,J, Bj = Ξ} with unknown cdf F 1,,F I, respectively Let X i1,,x ini be a random sample of F i, i = 1,,I We set π ij = P i (X i B j ) i = 1,,I, j = 1,,J, where P i is the probability function associated with F i, i = 1,,I The data is grouped in the following way: let N ij denote the number of observations in the i-th sample fall in the j-th category N ij = 1[X ik B j ] i = 1,,I, j = 1,,J k=1

23 Description of the Chi-Squared Test For A Contingency Table 17 Then, we have an I J contingency table as per Table The null hypothesis for a chi-squared test of homogeneity of the random variables is which is equivalent to H 0 : F 1 (x) = = F I (x) for all x R, (10) H 0 : π 1j = = π Ij = π j, j = 1,,J (11) Since the hypothesis H 0 is more restrictive than the hypothesis H 0, if the hypothesis H 0 is rejected then it is natural to reject the hypothesis H 0 If the hypothesis H 0 is not rejected, then we can say that the grouped data do not contradict the hypothesis H 0 The Pearson chi-squared statistic Given that the row totals are fixed, the contingency table falls under the row-multinomial model and the i-th row N i has a multinomial(n i,π i ) distribution, i = 1,,I The Pearson statistic to test H 0 against hypothesis the H 1 : π ij π j for some j, is X P = ( N ij ˆµ (0) ij where ˆµ (0) ij is the ML estimator for µ ij = n i π ij under H 0, i = 1,,I, j = 1,,J We need to find the ML estimators ˆπ (0) ij under H 0 Under the row-multinomial model, the log-likelihood function for π is Under H 0, the log-likelihood is l(π {n ij },{n i },n) l 0 (π {n ij },{n i },n) ˆµ (0) ij ) n ij lnπ ij n j lnπ j Under the constraint π 1 + +π J = 1, the Lagrange function under H 0 is L 0 ({n ij },λ {π j },n) n j lnπ j λ To find the ML estimator for π j, we maximise the Lagrange function: π j 1 L 0 ({n ij },λ {π j },n) = 0 π j = n j π j λ Substituting into the constraint yields (n 1 + +n J )/λ = 1, so λ = n and the ML estimators for the marginal column probabilities π j under H 0 are ˆπ (0) j = N j n j = 1,,J

24 Description of the Chi-Squared Test For A Contingency Table 18 Then, the ML estimators for the cell probabilities π ij under H 0 : π 1j = = π Ij = π j are ˆπ (0) ij = ˆπ (0) j = N j n i = 1,,I, j = 1,,J and the ML estimators for the cell means µ ij under H 0 are ij = n i N j, i = 1,,I, j = 1,,J n ˆµ (0) Thus, the Pearson statistic for homogeneity is X P = 1 n = 1 n = n Theorem 8 Under the hypothesis, (nn ij n i N j ) n i N j n Nij nn i N j N ij (n i N j ) N ij n i N j n i N j XP = n N ij 1 n i N j N ij 1 n H 0 : π 1j = = π Ij = π j, j = 1,,J n i N j the Pearson statistic XP has an approximately chi-squared distribution with (I 1)(J 1) degrees of freedom as n i, i = 1,,I, and n Proof Under H 0, by the same approach as for Theorem 6, we obtain that for each row, Xi (N ij n i N j ) = n i N j i = 1,,I approximately has a χ (J 1) distribution as n i, i = 1,,I Since only I 1 of the rows are independent under H 0, XP = I X i approximately has a χ (I 1)(J 1) distribution as n i, i = 1,,I and n The likelihood ratio test statistic Under the row-multinomial model, the likelihood function for π = (π 11,,π IJ ) is I ( ) L(π {n ij },{n i },n) = n i J π nij ij, n i1,,n ij and the unrestricted ML estimators for the probabilities π ij are ˆπ ij = N ij n i = 1,,I j = 1,,J

25 Description of the Chi-Squared Test For A Contingency Table 19 As we saw, ˆπ (0) ij = ˆπ (0) j = N j /n, i = 1,,I and j = 1,,J Therefore, the two-sided LRT statistic is Λ(N,n) = ( ) L ˆπ (0) ij N,n L(ˆπ ij N,n) = I [ ( N i N i1,,n ij I [ ( N i N i1,,n ij and the chi-squared LRT statistic for homogeneity is G = lnλ = ) J ) J N ij ln ] ( ) Nij N j n ] = ( ) Nij Nij n ( Nij N j ) I J ( ) Nij N j N ij The parameter space is Θ = { (π 11,,π IJ ) [0,1] IJ : J } π ij = 1 i = 1,,I and dimθ = I(J 1) because of the relation π ij = 1 π ij for the I rows The parameter space under H 0 is { } J Θ 0 = (π 1,,π J ) [0,1] J : π j = 1 and dimθ 0 = J 1 because of the relation π J = 1 π j So under H 0, the chi-squared LRT statistic G has a chi-squared distribution with I(J 1) (J 1) = (I 1)(J 1) degrees of freedom as n i, i = 1,,I, and n 43 The chi-squared test for independence The objective of the chi-squared test of independence is to decide if two samples are independent by comparing the observations to the expected values under the hypothesis of independence Let X be a random variable, defined on the outcome space Ξ which has partition A = { A 1,,A I : A i A j =, i j = 1,,I, Ai = Ξ} Similarly, let Y random variable, defined on the outcome space Ψ which has partition { B = B 1,,B J : B i B j =, i j = 1,,J, Bj = Ψ} The null hypothesis of independence between X and Y is H 0 : F (x,y) = F X (x)f Y (y) for all (x,y) R (1) where F is the joint cdf, F X and F Y are the cdf for X and Y, respectively Suppose (X 1,Y 1 ),,(X n,y n ) is a random sample of the random vector (X,Y) Let N ij denote the random variable that counts the number of observations that fall in the (i, j)-th cross-category: N ij = n 1[X k A i,y k B j ] i = 1,,I, j = 1,,J k=1 Then we have an I J contingency table as per Table 1

26 Description of the Chi-Squared Test For A Contingency Table 0 We set π ij = P (X A i,y B j ) i = 1,,I, j = 1,,J where P is the probability function associated with the joint cdf F If the hypothesis H 0 is true, then we would have π ij = π i π j for all i and j Thus a corresponding null hypothesis is H 0 : π ij = π i π j i = 1,,I j = 1,,J (13) Since the hypothesis H 0 is narrower than the hypothesis H 0, if the hypothesis H 0 is rejected then it is natural to reject the hypothesis H 0 If the hypothesis H 0 is not rejected, then we can say that the grouped data do not contradict the hypothesis H 0 The Pearson chi-squared statistic Given that only the total number of observations is fixed, the contingency table falls under the unrestricted bivariate sampling model and so the random vector of the cell counts N = (N 11,,N IJ ) has a multinomial(n,π) such that π π IJ = 1 The Pearson statistic to test H 0 against H 1 : π ij π i π j for some (i,j), is X P = ( N ij ˆµ (0) ij where ˆµ (0) ij is the ML estimator for µ ij = nπ ij under H 0, i = 1,,I, j = 1,,J We need to find the ML estimators ˆπ (0) ij under H 0 : π ij = π i π j Under the unrestricted bivariate sampling model, the log-likelihood function for π is Under H 0, the log-likelihood is l(π {n ij },n) l 0 (π {n ij },n) ˆµ (0) ij n i lnπ i + ) n ij lnπ ij n j lnπ j Under the constraints π 1 + +π I = 1 and π 1 + +π J = 1, the Lagrange function under H 0 is ) L 0 ({n ij },λ {π i },{π j },n) n i lnπ i + π i 1 λ π j 1 n j lnπ j λ 1( To find the ML estimator for π i, we maximise the Lagrange function: π i L 0 ({n ij },λ {π i },{π j },n) = 0 π i = n i λ 1 Substituting into the first constraint yields (n 1 + +n I )/λ 1 = 1, so λ 1 = n and the ML estimators for the marginal row probabilities π i under H 0 are ˆπ (0) i = N i n i = 1,,I

27 Description of the Chi-Squared Test For A Contingency Table 1 Similarly, λ = n and the ML estimators for the marginal column probabilities π j under H 0 are ˆπ (0) j = N j n j = 1,,J Then, the ML estimators for the cell probabilities π ij under H 0 are ˆπ (0) ij = ˆπ (0) i ˆπ (0) j = N i N j n n and the ML estimators for the cell means µ ij under H 0 are ˆµ (0) ij = nˆπ (0) ij = N i N j n Thus, the Pearson statistic for independence is X P = 1 n = 1 n = n Theorem 9 Under the hypothesis (nn ij N i N j ) N i N j i = 1,,I, j = 1,,J i = 1,,I, j = 1,,J (nn ij ) nn ij N i N j +t(n i N j ) N ij N i N j N i N j XP N = n ij 1 N i N j N ij + 1 n N i N j H 0 : π ij = π i π j i = 1,,I j = 1,,J the Pearson statistic X P asymptotically has as a χ (I 1)(J 1) as n i, i = 1,,I, and n Proof Similar to the Pearson statistic for the test of homogeneity except with a vector of length IJ The likelihood ratio test statistic Under the unrestricted bivariate sampling model, likelihood function for π = (π 11,,π IJ ) is ( ) I n J L(π {n ij },n) = n 11,,n IJ and the unrestricted ML estimators for the cell probabilities π ij are ˆπ ij = N ij n i = 1,,I j = 1,,J As we saw, ˆπ (0) ij = ˆπ (0) i ˆπ (0) j = N i N j /n for all i = 1,,I and j = 1,,J Therefore, the two-sided LRT statistic is Λ(N,n) = ( ) L ˆπ (0) ij N,n L(ˆπ ij N,n) = ( n ) I N 11,,N IJ ( ) n I N 11,,N IJ J J π nij ij, ( ) Nij Ni N j n ( ) = Nij Nij n I J ( ) Nij Ni N j nn ij

28 Description of the Chi-Squared Test For A Contingency Table and the chi-squared LRT statistic for independence is G = lnλ = ( ) nnij N ij ln N i N j The parameter space is Θ = { (π 11,,π IJ ) [0,1] IJ : I J } π ij = 1 and dimθ = IJ 1 because of the relation π IJ = 1 π ij The parameter space under H 0 is { Θ 0 = (π 1 π 1,,π I π J ) I [0,1] IJ : π i = 1 } J π j = 1 anddimθ 0 = (I 1)+(J 1)becauseoftherelationsπ I = 1 π i andπ J = 1 π j SounderH 0, the chi-squared LRT statistic G has a chi-squared distribution with IJ 1 (I 1) (J 1) = (I 1)(J 1) degrees of freedom as n

29 Chapter 3 Tools for the Decomposition of the Pearson Statistic In this chapter, we first discuss why sometimes the Pearson and likelihood ratio tests are not appropriate to use We follow to give the three major concepts that we will rely on for the decomposition of the Pearson statistic: ranked data, smooth models and orthonormal polynomials with respect to a distribution 31 Ordinal Categories and Testing for Trends Chi-squared tests of independence merely indicate the degree of evidence of association between the row and column variables/classifications One major fault of the Pearson and LRT chi-squared tests of independence is that the expected cell counts only depend on the marginal totals The order of the categories of the rows and the columns is not taken into account Thus Pearson and LRT statistics treat both the row and column variables/classifications as being nominal If the chi-squared tests are applied when at least one variable/classification is ordinal, this information is ignored and may lead to a false decision Example 31 (From Agresti[1]) In the following table the variables are income and job satisfaction, measured for the black males in a national (USA) sample Both classifications are ordinal, with the categories very dissatisfied (VD), little dissatisfied (LD), moderately satisfied (MS) and very satisfied (VS) Job Satisfaction Income (USD) VD LD MS VS Total < 15, ,000 5, ,000 40, > 40, Total The Pearson and LRT statistics for testing independence are X P = 60 and G = 68 with 9 degrees of freedom and p-values are 074 and 066, respectively The statistics show little evidence of association However, we can permute the columns and rows to obtain the following table 3

30 Tools for the Decomposition of the Pearson Statistic 4 Job Satisfaction Income (USD) VD LD MS VS Total < 15, ,000 5, , , > 40, Total Since columns and rows of the second table are simply a permutation of the columns and rows from the first table, the cell counts are still being compared to the same estimated cell means and the Pearson and LRT statistics have the same value thus not rejecting the hypothesis of independence between income and job satisfaction, yet we can see there is an increasing trend In this thesis, we also look at chi-squared tests of homogeneity against monotonic trend Let X 1,,X I be random variables, defined on the same outcome space Ξ which has the partition { B = B 1,,B J : B i B j =, i j = 1,,J, Bj = Ξ} with unknown cdf F 1,,F I, respectively We set π ij = P i (X i B j ) i = 1,,I, j = 1,,J, where P i is the probability function associated with F i, i = 1,,I The null hypothesis for a chi-squared test of homogeneity of the random variables is which is equivalent to where H 0 : F 1 (x) = = F I (x) for all x R H 0 : τ i1 = = τ Ij,j = 1,,J { πi1 + +π ij j = 1,,J 1, τ ij = 1 j = J are the cumulative cell probabilities The chi-squared test of homogeneity against monotonic trend tests H 0 against H 1 : F 1 (x) F I (x) for all x R H : F 1 (x) F I (x) for all x R or H 3 : F 1 (x) F I (x) or F 1 (x) F I (x) for all x R with at least one strict inequality in the alternatives This is equivalent to testing H 0 against H 1 : τ i1 τ Ij j = 1,,J, (31) H : τ i1 τ Ij j = 1,,J, (3) or H 3 : τ i1 τ Ij or τ i1 τ Ij j = 1,,J (33)

31 Tools for the Decomposition of the Pearson Statistic 5 with at least one strict inequality in the alternative hypotheses The alternative (31) is equivalent to a monotone increasing trend in the row distributions of a row-multinomial model, the alternative (3) is equivalent to a monotone decreasing trend, and the alternative (33) is equivalent to a monotonic trend Once again, the Pearson and LRT chi-squared tests are invariant to the possible order of the columns when testing against the above directional alternatives and may result into a false decision Example 3 (From Bagdonavičius et al[4]) Investigating the granular composition of quartz in Lithuanian and Saharan sand samples, the data on the lengths (in cm) of the maximal axis of the quartz grains were used Using the granular composition of the samples, conclusions on the geological sand formation conditions were drawn The data are grouped in intervals Interval midpoints Total Lithuanian Sand Saharan Sand Total The Pearson and LRT statistics to verify the hypothesis that the length of the maximum axis has the same distribution in Lithuanian and Saharan sand are X P = 75 and G = 745 with 7 degrees of freedom and p-values < 10 1 The statistics show there is evidence of association Looking at the table of cumulative cell probabilities, Interval midpoints Lithuanian Sand Saharan Sand an appropriate follow-up test would compare the hypothesis of homogeneity against decreasing trend in the row distributions For both these situations, ordinal variables or trend alternatives, alterations to the usual chi-squared tests have been made to increase the power of the test Without loss of generality, we are interested in detecting monotone increasing trends 3 Ranked Data and Contingency Tables We consider nonparametric tests based on statistics which depend only on the location of observations in the ordered sample and not directly on their values 31 Ranked data and their properties Let X 1,,X n be a random sample of a random variable X and let X (1) X (n) be the order statistics If the distribution of X is absolutely continuous, ie the probability of having tied ranks is zero, and X (1) < < X (n), the rank R i of X i is the order number of X i in the ordered sample X (1),,X (n) : R i = n ji [ ] X i = X (j)

32 Tools for the Decomposition of the Pearson Statistic 6 Since ranks take the values 1,,n, the sum and the sum of squares of the ranks are constant n R i = n i = n(n+1), n Ri = n i = n(n+1)(n+1) 6 Then, if the distribution of the random variable X is absolutely continuous, then the ranks have probability P(R = i) = 1/n and E[R i ] = n+1, cov[r i,r j ] = n+1 [ ] ni n 1 n 1 n 1 i,j = 1,,n where I n is the n n identity matrix and 1 n is the vector of n 1 s The notion of rank can be generalized to the case where the distribution of the random variable X is not necessarily absolutely continuous, thus allowing for tied ranks If there is a group of s coinciding at the j-th order statistic in the ordered sample X (1) X (n) and X i = X (j) then R i is defined as the arithmetic mean of the positions of coinciding observations: R i = j +(j +1)+ +(j +s 1) s = j+s 1 a=j s 1 a t = j + a=1 a s = j + s 1 These adjusted ranks, called mid-ranks, have no effect on the mean of the ranks Suppose X i1 = = X is = X (j), then the sum of their ranks is s R ib = b=1 s b=1 and the sum of all ranks remains the same Set ( j + s 1 ) = S = s b=1 k s l (s l 1) l=1 j+s 1 a=j j+s 1 a s = a where k denotes the number of random groups with coinciding members in the ordered sample and s l denotes the number of the members in the l-th group Theorem 31 Conditional on the observed ties, the means and the covariances of the ranks are a=j E[R i ] = n+1 cov[r i,r j ] = n(n 1) S 1n(n 1) [ni n 1 n 1 n] i,j = 1,,n Proof The proof is provided by Bagdonavičius et al[4] 3 Advantages and disadvantages of working with ranked data Conover[17] gives two reasons why one would prefer to work with ranks over the actual data: If the numbers assigned to the observations have no meaning by themselves but rather attain meaning only in an ordinal comparison with other observations, then the number contains no more information than the ranks contain

33 Tools for the Decomposition of the Pearson Statistic 7 Even if the numbers have meaning but the distribution function is not normal, the probability theory is beyond our reach when the statistic is based on the actual data The probability theory based on ranks is relatively simple and does not depend on the distribution in many cases Conover[17] defines a nonparametric method as follows: Definition 31 A statistical method is nonparametric if it satisfies at least one of the following criteria: The method may be used on data with a nominal scale of measurement The method may be used on data with an ordinal scale of measurement The method may be usedon data with an interval or ratio scale of measurement, where the distribution function of the random variable producing the data is either unspecified or specified except for an infinite number of unknown parameters Thus satisfying the second criteria, a statistical method using ranked data is considered nonparametric and has the following advantages: As given by the definition, nonparametric methods based on ranked data can be used on data with ordinal, interval or ratio scale of measurement As they were originally developed before the wide use of computers, nonparametric methods are intuitive and are simple to carry out by hand, for small samples at least Thereareno orverylimited assumptionsonthe formatofthe data A nonparametricmethod may be preferable when the assumptions required for a parametric method are not valid For example, the scale of the measurement: the Pearson correlation assumes data is at least interval while its nonparametric equivalent, the Spearman correlation, assumes the data is at least ordinal the underlying distribution: the one sample t-test requires observations to be drawn from a normally distributed population while its nonparametric equivalent, the Wilcoxon signed rank test, simply assumes the observations are drawn from the same population However, we need to acknowledge there are also some disadvantages to the use of nonparametric methods: If the sample size is too large, nonparametric methods may be difficult to compute by hand and unlike parametric methods, appropriate computer software for nonparametric methods are usually computationally intensive and can be limited, although the situation is improving If the assumptions of the corresponding parametric method hold or the sample size is not large enough, nonparametric methods may lack power compared to their parametric equivalent Tied values can be problematic when these are common, and adjustments to the test statistic may be necessary When the data follows a normal distribution, for example, the mean and standard deviation are all that is required to understand the distribution and make inference With nonparametric methods, because there only probabilities involved, reducing the data to a few number (eg the median) does not give an accurate picture In this thesis, though there may be a large number of categories, with an ordinal variable and/or a trend, we reduce the amount of information necessary to better interpret the data

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter