CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter is a characteristic of a population A parametric inference test (parametric statistic) is one that depends considerably on population characteristics, or parameters, for its use. o For example, the z test, t test, and ANOVA F test The z Test (single sample) Requires that we specify the mean (µ) and standard deviation (σ) of the null hypothesis population, and requires that the population scores must be normally distributed for small Ns The t Test (single sample, dependent, and independent) (For Single Sample) requires that we know the population mean (µ) and that the population scores must be normally distributed (we estimate the population standard deviation, σ - with the sample standard deviation, s) (For Two-Sample or Two-Condition, correlated t or independent t) both require that the population scores be normally distributed with small Ns the independent t further requires that the population variances be equal The F Test (One-way ANOVA) Requirements are similar to the independent t o Depends on an established distribution The requirements of nonparametric tests are minimal. Nonparametric tests depend little on knowing population distributions o Often referred to as distribution-free tests o For example, the sign test, chi-square goodness-of-fit, chi-square test of independence?? Since nonparametric inference tests have fewer requirements or assumptions about population characteristics, why don t we use them all of the time and forget about parametric tests? o Many of the parametric inference tests are robust with regard to violations of underlying assumptions The main reasons for preferring parametric to nonparametric tests are that, in general, 1. They are more powerful than nonparametric tests, and. They are more versatile than nonparametric tests.

o For example, no comparable nonparametric test exists for a factorial (multiple variables and interactions) ANOVA. As a general rule Investigators will use parametric tests whenever possible o However, when there is an extreme violation of an assumption of the parametric test or if the investigator believes the scaling of the data makes the parametric test inappropriate, a nonparametric inference test will be employed. II. CHI-SQUARE ( ) A. SINGLE-VARIABLE EXPERIMENTS Also known as the Chi-Square Goodness-of-Fit Test The inference test most often used with nominal data is a nonparametric test called chi-square ( ). o Recall that with this type of data (nominal data), observations are grouped into several discrete, mutually exclusive categories, and one counts the frequency of occurrences in each category. 1. Computation of χ obt To calculate χ obt, we must first determine the frequency we would expect to get in each cell if sampling is random from the null-hypothesis population. o These frequencies are called expected frequencies and symbolized by f e Expected frequencies can be equally distributed, that is, based on the number of observations divided equally by the number of categories in the variable. Expected frequencies can be proportionally distributed, that is based on a theory or prior literature. o The frequencies actually obtained in the experiment are called observed frequencies and are symbolized by f o f e = expected frequency under the assumption sampling is random from the null-hypothesis population f o = observed frequency in the sample It should be clear that the closer the observed frequency of each cell is to the expected frequency for that cell, the more reasonable is H 0 (if we retain the null hypothesis, we say that there is a good fit ) On the other hand, the greater the difference between f o and f e is, the more reasonable H a becomes (the fit is not good) After determining f e for each cell, we obtain the difference between f o and f e, square the difference, and divide by f e. In symbolic form, (f o f e ) / f e is computed for each cell. Finally, we sum the resultant values from each of the cells. CHAPTER 17 PAGE

f o f e o In equation form, χ obt = Σ ( ) f Where f o = observed frequency in the cell e f e = expected frequency in the cell, and Σ is over all the cells From this equation, you can see that χ is basically a measure of how different the observed frequencies are from the expected frequencies.. Evaluation of χ obt The theoretical sampling distribution of χ is shown below (Figure 17.1, p. 431) The χ distribution consists of a family of curves that, like the t distribution, varies with degrees of freedom (sample size). For the lower degrees of freedom, the curves are positively skewed. o As the number of degrees of freedom associated with Χ increases, the respective sampling distribution approaches symmetry (i.e., a normal distribution). The degrees of freedom are determined by the number of f o scores (cells) that are free to vary. o In general, with experiments involving just one variable, there are k 1 degrees of freedom, where k equals the number of groups or categories. Table H in Appendix D (p. 548) gives the critical values of χ for different alpha levels. CHAPTER 17 PAGE 3

Since χ is basically a measure of the overall discrepancy between f o and f e, it follows that the larger the discrepancy between the observed and expected frequencies is, the larger the value of χ obt will be. The larger the value of χ obt is, the more unreasonable the null hypothesis is. As with the t and F tests, if χ obt we reject the null hypothesis. o The decision rule states the following: If χ obt > χ crit, reject H 0 falls within the critical region for rejection, then It should be noted that in calculating χ obt, it doesn t matter whether f o is greater or less than f e. The difference is squared, divided by f e, and added to the other cells to obtain χ obt. Since the direction of the difference is immaterial, the χ test is a nondirectional test. Since each difference adds to the value of χ, the critical region for rejection always lies under the right-hand tail of the χ distribution. B. TEST OF INDEPENDENCE BETWEEN TWO VARIABLES One of the main uses of χ is in determining whether two categorical variables are independent or are related (dependent). A contingency table is a two-way table showing the contingency (i.e., dependence on chance) between two variables where the variables have been classified into mutually exclusive categories and the cell entries are frequencies. o In constructing a contingency table, it is essential that the categories be mutually exclusive. If an entry is appropriate for one of the cells, the categories must be such that it cannot appropriately be entered in any other cell. The null hypothesis states that there is no contingency between the variables in the population. That is, the two categories are independent of each other. o The null hypothesis states that the frequencies are due to random sampling from a population in which the proportions are equal. o There is no difference between the two categories. o If we reject H 0, we are concluding, with a known probability (equal to the alpha level), that the variables are dependent on each other in the population. The alternative hypothesis is that the two categories are dependent upon each other. o There is a difference (not the same) between the two categories. o The categories differ across the levels of the other category. CHAPTER 17 PAGE 4

1. Computation of χ obt To test the null hypothesis, we must calculate χ and compare it with χ crit With experiments involving two variables, the most difficult part of the process is in determining f e for each cell. If we do not know the population proportions, we estimate them from the sample. o Expected frequencies can be equally distributed, that is, based on the number of observations divided equally by the number of categories in the variable. o Expected frequencies can be proportionally distributed, that is based on a theory or prior literature. For the test of independence, the expected frequencies are computed, based on the percentages in the marginal totals. o The most convenient way to calculate the expected frequency for each cell is to multiply the total row frequency (f r ) by the total column frequency (f c ) corresponding to the respective cell and then to divide this product by the total frequency (n). Refer to Formula Expected frequency = o Notice that the sum of the expected frequencies for any row or column equals the respective row or column total. This can be a useful check of the calculations.. Evaluation of χ obt f r n f c. To evaluate χ obt, we must compare it with χ crit for the appropriate df. The degrees of freedom (df) are equal to the number of f o scores that are free to vary while keeping the totals constant. o In the two-variable experiment, we must keep both the column and row marginal at the same values. The degrees of freedom for experiments involving a contingency between two variables are equal to the number of f o scores that are free to vary while at the same time keeping the column and row marginals the same. o In the case of a contingency table, df = 1. o In the case of a 3 contingency table, df =. The equation to calculate the df for contingency tables is as follows: o df = (Number of Rows 1)(Number of Columns 1) Refer to Formula df = (R 1)(C 1) CHAPTER 17 PAGE 5

The χ test is not limited to or 3 tables. It can be used with contingency tables containing any number of rows and columns. o The df formula is perfectly general and applies to all contingency tables. For example, if we did an experiment involving two variables and had four rows and six columns in the table df = (R 1)(C 1) = (4 1)(6 1) = (3)(5) = 15 The decision rule states the following: o If χ obt > χ crit, reject H o o If χ obt < χ crit, retain H o C. ASSUMPTIONS UNDERLYING χ The basic assumption in using χ is that there is independence between each observation recorded in the contingency table. o This means that each subject can have only one entry in the table. It is not permissible to take several measurements on the same subject and enter them as separate frequencies in the same or different cells. o This error would produce a larger N than there are independent observations. A second assumption is that the sample size must be large enough that the expected frequency in each cell is at least 5 for tables where R or C is greater than. If the table is a 1 or table, then each expected frequency should be at least 10. o When the expected frequencies in any of the cells of a contingency table are small (less than 5), the sampling distribution of χ for these data may depart substantially from continuity (the state of being continuous). Thus, for the theoretical sampling distribution of may poorly fit the data. Χ for 1 degree of freedom o For this situation, an adjustment, called the Yates correction for continuity, has been suggested for application to these data. The correction merely involves reducing the absolute value of each numerator by 0.5 units before squaring. The argument for using or not using Yates correction is largely split it becomes a judgment call on the part of the researcher. Simply know that it is an option If the sample size is small enough to result in expected frequencies that violate these requirements, then the actual sampling distribution of χ deviates considerably from the theoretical one and the probability values given in Table H do not apply. CHAPTER 17 PAGE 6

If the experiment involves a contingency table and the data violate this assumption, Fisher s exact probability test should be used. Although χ is used frequently when the data are only of nominal scaling, it is not limited to nominal data. o Chi-square can be used with ordinal, interval, and ratio data. o Regardless of the actual scaling, the data must be reduced to mutually exclusive categories and appropriate frequencies before χ can be employed. III. THE WILCOXON MATCHED-PAIRS SIGNED RANKS TEST The Wilcoxon signed ranks test is used in conjunction with the correlated groups design with data that are at least ordinal in scaling. It is a relatively powerful test sometimes used in place of t test for correlated groups (dependent-samples t test) when there is an extreme violation of the normality assumption or when the data are not of appropriate scaling. The Wilcoxon signed ranks test considers both the magnitude of difference scores and their direction, which makes it more powerful than the sign test. It is, however, less powerful than the t test for correlated groups. o While it takes into account the magnitude of the difference scores, it considers only the rank order of the difference scores, not their actual magnitude, as does the t test. A. ASSUMPTIONS OF THE WILCOXON SIGNED RANKS TEST There are two assumptions underlying the Wilcoxon signed ranks test. o The scores within each pair must be at least of ordinal measurement. o The difference scores must also have at least ordinal scaling. IV. THE MANN-WHITNEY U TEST The Mann-Whitney U test is used in conjunction with the independent groups design with data that are at least ordinal in scaling. It is a powerful nonparametric test used in place of the t test for independent groups when there is an extreme violation of the normality assumption or when the data are not of appropriate scaling for the t test. A. TIED RANKS Rank-ordered tied scores are handled similar to that for Spearman rho correlation coefficient. CHAPTER 17 PAGE 7

B. ASSUMPTION UNDERLYING THE MANN-WHITNEY U TEST The Mann-Whitney U test requires that the data be at least ordinal in scaling. It does not depend on the population scores being of any particular shape (e.g., normal distributions), as does the t test for independent groups. The Mann-Whitney U test can be used instead of the t test for independent groups when there is a severe violation of the normality assumption or when the data are not of interval or ratio scaling. The Mann-Whitney U test is a powerful test. o However, since it uses only the ordinal property of the scores, it is not as powerful as the t test for independent groups, which uses the interval property of the scores. V. THE KRUSKAL-WALLIS TEST The Kruskal-Wallis test is a nonparametric test that is used with an independent groups design employing k samples. It is used as a substitute for the parametric one-way ANOVA, when the assumptions of that test are seriously violated. The Kruskal-Wallis test does not assume population normality nor homogeneity of variance, as does the parametric ANOVA, and requires only ordinal scaling of the dependent variable. It is used when violations of population normality and/or homogeneity of variance are extreme or when interval or ratio scaling are required and not met by the data. A. ASSUMPTIONS UNDERLYING THE KRUSKAL-WALLIS TEST To use the Kruskal-Wallis test, the data must be of at least ordinal scaling. There must be at least five scores in each sample to use the probabilities given in the table of chi-square. VI. STEPS IN TESTING CHI-SQUARE TEST OF INDEPENDENCE 1. Making Assumptions and Meeting Test Requirements. o Model: Independent random samples, Level of Measurement is nominal. Stating the Hypotheses. o H 0 : The two variables are independent o H a : The two variables are dependent 3. Selecting the Sampling Distribution and Establishing the Critical Region. o Chi-square distribution, alpha level, degrees of freedom, and critical value 4. Computing the Test Statistic. 5. Making a Decision and Interpreting the Results of the Test. CHAPTER 17 PAGE 8