Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Similar documents
One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Dealing with the assumption of independence between samples - introducing the paired design.

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Descriptive Statistics (And a little bit on rounding and significant digits)

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Categorical Data Analysis. The data are often just counts of how many things each category has.

- a value calculated or derived from the data.

Confidence intervals

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Chapter 1 Review of Equations and Inequalities

Lecture 12: Quality Control I: Control of Location

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Business Statistics. Lecture 5: Confidence Intervals

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Introduction to Algebra: The First Week

Chapter 7 Comparison of two independent samples

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Chemical Applications of Symmetry and Group Theory Prof. Manabendra Chandra Department of Chemistry Indian Institute of Technology, Kanpur.

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lesson 21 Not So Dramatic Quadratics

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

CHAPTER 7: TECHNIQUES OF INTEGRATION

Please bring the task to your first physics lesson and hand it to the teacher.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Uni- and Bivariate Power

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Chapter 26: Comparing Counts (Chi Square)

Line Integrals and Path Independence

Lecture 4: Constructing the Integers, Rationals and Reals

LECTURE 15: SIMPLE LINEAR REGRESSION I

CHM 105 & 106 MO1 UNIT TWO, LECTURE THREE 1 IN OUR PREVIOUS LECTURE WE TALKED ABOUT USING CHEMICAL EQUATIONS TO SHOW THE

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions

Guide to Proofs on Sets

PHY 101L - Experiments in Mechanics

Uncertainty. Michael Peters December 27, 2013

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Section 9.1 (Part 2) (pp ) Type I and Type II Errors

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

Quadratic Equations Part I

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Discrete Structures Proofwriting Checklist

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Business Statistics. Lecture 10: Course Review

One-sample categorical data: approximate inference

CS 124 Math Review Section January 29, 2018

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Bridging the gap between GCSE and A level mathematics

Ratios, Proportions, Unit Conversions, and the Factor-Label Method

23. MORE HYPOTHESIS TESTING

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

CHM 105 & 106 UNIT TWO, LECTURE EIGHT 1 IN OUR PREVIOUS LECTURE WE WERE LOOKING AT CONCENTRATION UNITS FOR SOLUTIONS

Instructor (Brad Osgood)

ASTRO 114 Lecture Okay. What we re going to discuss today are what we call radiation laws. We ve

DIFFERENTIAL EQUATIONS

Mathematical Statistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

INTRODUCTION TO ANALYSIS OF VARIANCE

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

The Inductive Proof Template

Testing Research and Statistical Hypotheses

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

Do students sleep the recommended 8 hours a night on average?

(Riemann) Integration Sucks!!!

Chapter 27 Summary Inferences for Regression

Data Analysis and Statistical Methods Statistics 651

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Physics Motion Math. (Read objectives on screen.)

Math 31 Lesson Plan. Day 2: Sets; Binary Operations. Elizabeth Gillaspy. September 23, 2011

Lectures 5 & 6: Hypothesis Testing

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Lecture 10: F -Tests, ANOVA and R 2

irst we need to know that there are many ways to indicate multiplication; for example the product of 5 and 7 can be written in a variety of ways:

Math 5a Reading Assignments for Sections

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Multiple Regression Analysis

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

36 What is Linear Algebra?

N H 2 2 NH 3 and 2 NH 3 N H 2

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations

Algebra Exam. Solutions and Grading Guide

appstats27.notebook April 06, 2017

Isomorphisms and Well-definedness

Lecture 27: Theory of Computation. Marvin Zhang 08/08/2016

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Distribution-Free Procedures (Devore Chapter Fifteen)

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Transcription:

Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis, then test it, and either reject it or not. - In particular, we are talking about statistical hypotheses. - For instance, we have some idea about index finger length in humans. We can test to see if our idea is correct. Suppose I think that the true average finger length is 8.4 cm. I want to test to see if this is true. - Let s take a sample: [get everyone s finger length!] n = - mean = std. dev = std. error = - Now let s come up with a hypothesis: H 0 : the average population finger length is 8.4 cm, or (μ = 8.4 cm) - the answer to this will be either yes (sort of) or no. If the answer is no, what are the alternatives? - we need to put the no answers into an alternative hypothesis. We have three possibilities: H 1 : H 1 : H 1 : (μ is not 8.4 cm) (μ > 8.4 cm) (μ < 8.4 cm) - Important: we only pick one of these. For now, let s pick the first one and ignore the others. - Now what? Well, does it make sense that if our 95% CI includes 8.4 then we could say: - We have no evidence to doubt H 0?

- In other words, our H 0 seems reasonable, and the only reason μ might not be exactly 8.4 might be due to random chance. (Another example - you might get 7 heads purely by chance!). - That s sort of what we do, except that we don t directly calculate a CI, instead we proceed as follows (this is not given in your textbook): - we calculate the t-value: t = y s/ n - then we look up the t value for whatever confidence we want, and for n-1 d.f. - Finally, if our t > than the t from the tables, we REJECT our H 0 and accept our H 1. - On the other hand, if our t < than the t from the tables, we FAIL TO REJECT our H 0. - (notice the absolute value symbol around t) - Pay close attention to the way things are termed above - we ll get back to that soon. - The t-value we look up depends on our alternative hypothesis (do we want the upper tail, the lower tail, or both tails???). A couple more comments on this: - our t-tables give the probability in the upper tail (we already discussed this a little bit). - remember that the t distribution is symmetrical, so if we want the probability in the lower tail, we just need to change the sign of the t-value we dig out of the table [Or, stick with using the absolute value for our calculated t, remembering that by doing that we might be changing the sign of our calculated t. This will be important when we look at one-sided t-tests in a week or so). - if we want to put, for example, 5% in both the upper and lower tails, then we look up the t-value for 0.025, since if we use that value and the negative of that value, that puts 0.025 in each tail, or

II. Some details. 0.05 combined into both tails (a two tailed test, which is what we re doing in our example). - So what about our little finger experiment? - go through example. - We just conducted our first hypothesis test. We skipped a LOT of the details. This was an example of a one-sample t-test (which is not terribly useful in biology, and isn t covered in your book). - why is it called a one-sample t-test? - because we have one sample (e.g., our classes finger lengths) - because it uses the t-distribution. - Note how this is sort of backwards from a CI. For a CI we would calculate: y = t, s n - for the t-test, we calculate our t from the data, and then compare this to a critical value from the t-distribution. 1) More on notation & error types (see also p. 257-258 [252-254] {238-240} in your text). Notice in the equation just above that the subscript for t includes both ν and α. ν we already know is our degrees of freedom (n-1, in this case). We already discussed the other value, but now we re ready to give it a more precise name: α. - In a simple way, α corresponds to the critical value that we look up in our t- tables. But more specifically: α = Pr {reject H 0 } if H 0 is true - What does this mean? - Suppose you decide to reject H 0 : - Why do you reject H 0?? - because you don t think it s true. - but it might be!!!

- you can never be certain, all you can do is assign probabilities. You think the probability is less than 5% that H 0 is true, so you re willing to believe it s not true. But 5% is not 0%. - In other words, α is the probability of H 0 being true when we think it isn t true. - Here s another way of looking at it: H 0 is true H 1 is true we decide H 0 is true we re right we re wrong Type II error we decide H 1 is true we re wrong we re right Type I error - So α is the probability of making a type I error. - What about type II? As it turns out, the probability of making a type II error is called β, but often it s difficult to know what β might be. This will become a bit more obvious when we do a two-sample t-test. - but notice: if we try to minimize type I by choosing a really small α (say, 0.00000001), what happens to type II? It goes way up. Here s an example: Suppose we had information on litter sizes in lions (from a real study): 0 0 0 2 3 3 3 4 4 5 5 5 5 6 8 8 9 9 In this case, y = 4.4, s = 2.9 and n = 18). Now suppose our hypothesis for litter sizes in lions is that μ = 9 (ignoring that this really isn t normally distributed). First we figure out our H 0 : μ = 9 Then our H 1 : μ 9 ( means not equal ) We calculate our t-value:

t = y s n = 4.4 9 2.9/ 18 = 6.73 Now we look up t with 17 degrees of freedom and α =.05 (a very standard thing to do): t 17,.05 = 1.74 Because the absolute value of -6.73 > the tabulated value of t (=1.74), we reject our H 0. This seems reasonable, after all, a litter size of 9 seems kind of high, and we certainly don t think it s the average litter size. But now suppose we re really worried about Type I error, so we choose α =.00000005 (this is so silly it s not in your tables; I got this one another way): t 17,.00000005 = 8.36 So now what? We don t reject H 0, and conclude that we have no evidence to show H 0 is wrong. What did we probably do??? We probably committed a type II error. Seriously - don t you think that average lion cub size is probably not 9??? So as we decrease the probability of a type I error, we increase the probability of a type II error, and vice versa. - this a fundamental problem in statistics. How do we balance these two errors?? - if we can, we look at the cost of making a mistake. For example: We try out a new medicine for AIDS: - if we set α too high (e.g., at 0.3), we run the risk of deciding the medicine works when it really doesn t. - if we set α too low (e.g. at 0.000001), we run the risk of deciding the medicine does not work when it really does. Which is worse?? Probably the first option. We want to be pretty sure our medicine works before we start giving it to people - because otherwise a lot of folks might start dying, {particularly if we already have a medicine for AIDS that is working}.

- Kind of standard levels for α are: -.1,.05, and.01 - we usually pick amongst these depending on how worried we are about making a type I error. BUT there s nothing wrong with picking other values (e.g.,.025,.001, etc.). - Incidentally, if you pick something weird (say,.0368), that s also alright, but you re going to have to explain why you picked this (see particularly, the next section)! - a little bit more about α (yes, you re probably sick of α by now, but we re almost done). - in general, you should decide on your level of α before doing the test. - so what about p-values? - Why??? Suppose you get a t-value that s between the cut offs for.05 and.01. What are you going to do?? Unless you decided on α ahead of time, you might decide to do WHAT YOU WANT TO DO, which is cheap, sleazy, crummy and just bad statistics. This is an example of why some statisticians have a bad reputation. - As an extreme example, you might calculate your t-value, and then pick a tabulated t from the table in such a way so that you always get the result you want, and never mind what the probabilities are! - a p-value is the probability that you would have gotten the result you did or worse. - we ve been doing this a lot: Pr{Y < y} = p-value etc. - it basically tells us how confident we are about our answer. - if our p-value is.00001, then we re very happy. - if our p-value is.03, then we might not be that happy, even if we decided on α =.05 (but we re still happy enough)

- in summary: - IMPORTANT: many computer packages only report p-values, and not the tabulated t-values (or whatever other distribution we might wind up using later in the semester). In this case, we simply compare our p-value to α: - If our p-value is less than or equal to α, we reject H 0, if not, we do not reject H 0. This is (providing you know what you re doing) much easier than looking up all sorts of values in tables. - before we carry out a hypothesis test, we decide on α. This will determine wether or not we reject our H 0. - we can also report our p-value - this tells us how comfortable we are with our decision. 2) Power of a test (see also p. 259 [254] {240}). Remember: β = Pr {do not reject H 0 } if H 0 is false which is the probability of making a type II error. Now, what is 1-β?? 1-β = Pr {reject H 0 } if H 0 is false This is good! Obviously, we want to reject a false H 0 as much as possible. This is also called the power of a test. A test with more power will be better able to detect a false H 0. Your text has a good example - more power is analogous to better resolution in a microscope. It s better able to detect true differences. - different tests might have different powers. We ll discuss this a little more when we do the Wilcoxon-Mann-Whitney test, and actually be able to see how power might work. - if you re really interested, check out Appendix 7.1, p. 600 [623] {574}.

3) Concluding remarks. - A summary of doing a hypothesis test: - Decide on H 0 and H 1. - Decide on α. - Calculate a test statistics from the data (t in our example). - Compare this test statistic to tabulated values (t-tables) and reject or fail to reject H 0. - Or compare the p-value with α. - (we will repeat this list many, many times) - Why don t we accept H 0?? - Several reasons. Some are mathematical in nature, but here basically it comes down to: - We don t know if H 0 is true. H 1 might be true, even if we decide not to go with H 1 (If, for example, μ is actually close to, but not exactly, our hypothesized value, then it might be very difficult to figure out that H 0 is actually not true). [e.g., a more concrete example: suppose we test H 0 : μ = 10, H 1 : μ 10, and μ is actually 10.1. Would we be able to detect this?? We d need a small variance and/or a very large sample size]. - So all we say is we fail to reject H 0, and have no reason to doubt H 0 is true. - we ll get back to this when we do two-sample tests.