Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Similar documents
This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Dealing with the assumption of independence between samples - introducing the paired design.

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Descriptive Statistics (And a little bit on rounding and significant digits)

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Chapter 1 Review of Equations and Inequalities

Physics 509: Non-Parametric Statistics and Correlation Testing

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Module 9: Nonparametric Statistics Statistics (OA3102)

Categorical Data Analysis. The data are often just counts of how many things each category has.

Chapter 7 Class Notes Comparison of Two Independent Samples

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

MITOCW ocw f99-lec01_300k

- a value calculated or derived from the data.

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

Physics 509: Bootstrap and Robust Parameter Estimation

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

MITOCW ocw f99-lec05_300k

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Do students sleep the recommended 8 hours a night on average?

Confidence Intervals. - simply, an interval for which we have a certain confidence.

MITOCW ocw f99-lec23_300k

But, there is always a certain amount of mystery that hangs around it. People scratch their heads and can't figure

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

MITOCW ocw f99-lec09_300k

MITOCW ocw f99-lec30_300k

ANOVA - analysis of variance - used to compare the means of several populations.

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

MITOCW watch?v=vjzv6wjttnc

Sign test. Josemari Sarasola - Gizapedia. Statistics for Business. Josemari Sarasola - Gizapedia Sign test 1 / 13

Wilcoxon Test and Calculating Sample Sizes

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Math 2 Variable Manipulation Part 1 Algebraic Equations

Table Probabilities and Independence

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

In any hypothesis testing problem, there are two contradictory hypotheses under consideration.

Distribution-Free Procedures (Devore Chapter Fifteen)

Note: Please use the actual date you accessed this material in your citation.

Nonparametric Statistics

Probability and Statistics

3. Nonparametric methods

Note: Please use the actual date you accessed this material in your citation.

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Data Analysis and Statistical Methods Statistics 651

MITOCW ocw f07-lec39_300k

Comparison of Two Population Means

Intuitive Biostatistics: Choosing a statistical test

1 Impact Evaluation: Randomized Controlled Trial (RCT)

Selection and Adversary Arguments. COMP 215 Lecture 19

Math Review -- Conceptual Solutions

CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy

Non-parametric Statistics

Fibonacci mod k. In this section, we examine the question of which terms of the Fibonacci sequence have a given divisor k.

MITOCW 18. Quiz Review From Optional Problem Set 8

MITOCW ocw lec8

Statistics: revision

Y i = η + ɛ i, i = 1,...,n.

CHAPTER 9: HYPOTHESIS TESTING

Business Statistics MEDIAN: NON- PARAMETRIC TESTS

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

MITOCW ocw f99-lec16_300k

CHAPTER 10 Comparing Two Populations or Groups

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

MITOCW watch?v=ed_xr1bzuqs

MITOCW ocw-18_02-f07-lec02_220k

The Inductive Proof Template

Business Statistics. Lecture 10: Course Review

Solving Equations. Lesson Fifteen. Aims. Context. The aim of this lesson is to enable you to: solve linear equations

MITOCW ocw nov2005-pt1-220k_512kb.mp4

Loop Convergence. CS 536: Science of Programming, Fall 2018

Experimental Design, Data, and Data Summary

Resampling Methods. Lukas Meier

Multiple Regression Theory 2006 Samuel L. Baker

Confidence intervals

Chapter 7 Comparison of two independent samples

Introduction to Statistical Data Analysis III

Simple Regression Model. January 24, 2011

MITOCW watch?v=pqkyqu11eta

Assignment #9 Star Colors & the B-V Index

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Combinatorics and probability

MITOCW R11. Double Pendulum System

MITOCW watch?v=k3ofb7rlbve

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Multiple Regression Analysis

Lecture 11: Extrema. Nathan Pflueger. 2 October 2013

Discrete Probability. Chemistry & Physics. Medicine

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Chi Square Analysis M&M Statistics. Name Period Date

The general topic for today is going to be oscillations, which are extremely important in the applications and in

A strangely synimetric pattern involving conjugacies and "local" and "global" bisectors

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Transcription:

Violating the normal distribution assumption So what do you do if the data are not normal and you still need to perform a test? Remember, if your n is reasonably large, don t bother doing anything. Your ȳ will have a distribution that s approximately normal. What is large? Depends on how not normal your original data are, but somewhere between 20 and 30 is often adequate. But suppose you only have small sample sizes, and your data are not normal. Then what??? The Mann-Whitney U-test. (Note: Many statisticians (and several computer packages) will call this the Wilcoxon rank sum test. It's mathematically equivalent). The only assumption for this test is that the data are random. You still need to make sure you collected the data randomly! What are you testing? That the two distributions are the same! In other words, you test the following: H 0 : The distribution of sample 1 is the same as the distribution of sample 2 In symbols: H 0 : D 1 = D 2 H 1 : The distribution of sample 1 is not the same as the distribution of sample 2 In symbols: H 1 : D 1 D 2 This is the original form of the Mann-Whitney U-test, but often we want to say something about the means (or medians) as this is a little easier to explain. If we assume that the distributions are only different in location, but not in shape (e.g., two binomials, two uniforms, etc.), then we can use this test to test for equal means!

We will make this assumption in this class and then we can test the usual hypotheses: H 0 : μ 1 = μ 2 H 1 : μ 1 μ 2 (Incidentally, if we make the assumption of equal distributions except for locations we could test for equal medians as well. So now we proceed as usual: The test is identical; we just change H 0 and H 1 to use medians instead of means.) Figure out our α. Calculate our test statistic (see example on board and then below). As you might guess, this is called U, not t i) rank your data, from smallest value to largest. ii) for each data point, write down the number of values that are smaller IN THE OTHER SAMPLE next to it (use ½ if a value is perfectly tied). iii) Add up these numbers for both samples vi) check your work - the two sums should add up to the product of the sample sizes (see the example below). v) Choose the larger of these two sums, that s our U *.

Let s do an example. Compare your U * with the tabulated value of U: Use the correct table for your value of α The find the correct values of n 1 and n 2 : Finally, if U * U table, reject H 0. Otherwise, fail to reject H 0. We want to find out if caffeine affects heart rate. We take seven volunteers and measure their heart rate after drinking decaffeinated coffee. We take another six volunteers and measure their heart rate after drinking regular coffee. We get the following (exaggerated) results (already sorted): Decaffeinated Regular Coffee Coffee ------- ------- 42 74 67 78 68 79 69 81 70 96 73 124 93 ------- ------- ȳ 68.9 88.7 If we look at these data, we'd be tempted to conclude that a t-test ought to be able to find a difference here (there's a pretty big difference in the means). But let's look at the q-q plots first:

The q-q plots show that the data are seriously not normal (you should always do q-q plots before doing a t-test!) So a t-test is not appropriate here, and we'll do the MWU test instead. Let's set up the hypotheses (we'll assume equal distributions except for location - which might not be entirely true looking at the q-q plots): H 0 : the true mean heart rate for people drinking decaffeinated coffee is the same as the true mean heart rate for people drinking regular coffee. H 1 : the true mean heart rate for people drinking decaffeinated coffee is NOT the same as the true mean heart rate for people drinking regular coffee. Let s pick α = 0.05 Now we count the smaller data points in the other sample (doing both columns at once): Y 1 Y 2 0 42 0 67 0 68 0 69 0 70 0 73 4 93 74 6 78 6 79 6 81 6 96 7 124 7 K 1 = 4 K 2 = 38 You don't have to arrange things this way, but it's probably a little easier if you sort both columns together like this. The sums of our counts are referred to as K 1 and K 2. We check our work: n 1 x n 2 = K 1 + K 2 K 1 = 4 K 2 = 38 And we have: 7 x 6 = 38 + 4 = 42 (correct!)

We pick the larger of K 1 and K 2 and set this equal to U * : U * = max(k 1, K 2 ) = max(38,4) = 38 Now look this up in the tables: n = 7, m = 6, α = 0.05 (remember, we're using n as the larger sample size and m as the smaller sample size) So we get: U table = 36, and since: U * = 38 U table = 36 Some theory on this test (or why does it work? ) Here's an outline of what's happening: we reject H 0 and conclude that the means are different. Suppose n 1 = 5 and n 2 = 4, then K 1 + K 2 must be 20 (our check gives us 5 x 4 = 20). So, suppose there is no overlap at all (the data is least compatible with H 0 ). This implies that either K 1 or K 2 = 20, and therefore U * = 20. But if the data overlap perfectly, then K 1 = K 2 = 10, and U * = 10. This implies that large values of U * are incompatible with H 0, and this is the basis of the test. So where does U come from? Better, how is U distributed? Remember, t has a t-distribution, z a normal, etc. Note that U is discrete. We can calculate the probabilities associated with any given arrangement of the two samples. For our example here (n 1 = 5 and n 2 = 4) the probability of getting Pr{K 1 = 0 and K 2 = 20} =.008 This is one possible arrangement out of 126 different possibilities, or 1/126 = 0.008 How do we know there are 126 different possible arrangements? We can use the binomial coefficient. We have a total of 4 + 5 = 9 items, we want to choose 5 (= n 1 ) of them. In other words, how many ways can we pick 5 items from 9?

( 9 5) = 126 So we just go through and figure out the probabilities for all possible combinations of K 1 and K 2, for as many different values of n 1 and n 2 as we want. Incidentally, since we're doing a two sided test, we also need to calculate Pr{K 1 = 20 and K 2 = 0}, which is also (of course), 0.008 So for a two sided test we need: Pr{U * = 20} = Pr{K 1 = 0 and K 2 = 20} + Pr{K 1 = 20 and K 2 = 0} = 0.008 + 0.008 = 0.016 Finally, note again that for this example, Pr{U * = 20} =.016. So what happens if we choose α =.01? We loose! U is discrete, and can t take on values greater than 20 (if n 1 = 5 and n 2 = 4). It is impossible to reject H 0 if n 1 = 5 and n 2 = 4, and we pick α = 0.01 (and we're doing a two sided test). U * = 20 is the most extreme outcome, and this probability is not small enough to reject at α = 0.01 If you have really small sample sizes, you should probably check in the table first to make sure you can actually use the value of α that you want. An assumption of the Mann-Whitney U-test that's not obvious. Just like the t-test, the Mann-Whitney U-test is designed for continuous data. It may not do well with discrete data. In particular, it doesn't like ties. If data are truly continuous, the probability of any one value in sample 1 being the same as any one value in sample 2 is exactly 0. But before we look at this further, how do you deal with ties? When you have data that are tied, you use ½ for EACH of the data points in the other sample that are tied with the data point you re looking at. Probably the easiest way to describe this with an example or two:

Example 1: Y 1 Y 2 0 1 3 1 1 4 3 5 5 2.5 5 2.5 5 2.5 5 2.5 6 3 6 7 6 7 6 8 Sum 22 14 Note that the 5 in sample 1 (Y 1 ) gets a value of 3. This is because it is tied with four 5's in sample 2, and 4 x 1/2 = 2 Any number less than our number is counted as usual, so overall we have: 1 + (4 x 1/2) = 3. The 5's in sample 2 (Y 2 ) each get 2.5 (there are two numbers less than 5 in sample 1; the 5 in sample 1 gets 1/2): 2 + 1/2 = 2.5 Incidentally, the check still works (n 1 x n 2 = 6 x 6 = 36, and K 1 + K 2 = 36) Example 2 (just the results this time, skipping the details): Y 1 Y 2 0 1 2 1 1.5 5 5 2 1.5 5 3.5 6 6 3.5 6 3.5 6 3.5 5 7 Sum 11.5 13.5 And yes, the check still works.

The Mann-Whitney U test versus the t-test. You now have two tests that can each be used to test for equal means. Which is better? The answer depends on the power of the test. If the data are normal or the n's are large, the t-test is better. If the data are not normal, and the n's are small, the Mann-Whitney U-test is better. Sometimes, particularly if you have very long tails, it is much better. Incidentally, if the data are normal, the MWU test is still valid, just not as good as the t-test. So how do you decide which test to use? Follow the guidelines given above. If n 1 and n 2 are both less than about 20 to 30, and the data are badly not normal, use the MWU test. Otherwise, you can probably use the t-test. Of course, remember, the less normal your data are, the bigger the n's should be. If you can t check your assumptions, or forget in a couple of years what to do to make sure the t-test works, use the MWU test. Even if the data are normal, it actually has reasonable power (though obviously not as good as the t-test). It s almost always valid, unless the data are not random.