Hypothesis Testing with Z and T

Similar documents
79 Wyner Math Academy I Spring 2016

HYPOTHESIS TESTING. Hypothesis Testing

P (A) = P (B) = P (C) = P (D) =

Chapter 22. Comparing Two Proportions 1 /30

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Chapter 22. Comparing Two Proportions 1 /29

CHAPTER 10 HYPOTHESIS TESTING WITH TWO SAMPLES

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CS 124 Math Review Section January 29, 2018

CENTRAL LIMIT THEOREM (CLT)

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Chapter 7: Hypothesis Testing

Chapter 24. Comparing Means

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Objective - To understand experimental probability

Sampling Distributions

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

MAT Mathematics in Today's World

Hypothesis tests

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Chapter 9. Hypothesis testing. 9.1 Introduction

Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power)

CHAPTER 9: HYPOTHESIS TESTING

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Data Analysis and Statistical Methods Statistics 651

Single Sample Means. SOCY601 Alan Neustadtl

Two-Sample Inferential Statistics

STA Module 4 Probability Concepts. Rev.F08 1

COVENANT UNIVERSITY NIGERIA TUTORIAL KIT OMEGA SEMESTER PROGRAMME: ECONOMICS

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Inferential statistics

STAT 201 Chapter 5. Probability

Question 2 We saw in class that if we flip a coin K times, the the distribution in runs of success in the resulting sequence is equal to

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

STAT Chapter 8: Hypothesis Tests

Statistical Preliminaries. Stony Brook University CSE545, Fall 2016

Statistics 100A Homework 5 Solutions

Chapter 9 Inferences from Two Samples

CS 237: Probability in Computing

Mathematical Notation Math Introduction to Applied Statistics

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

TOPIC 12: RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Probability and Discrete Distributions

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

Econ 325: Introduction to Empirical Economics

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

1 Preliminaries Sample Space and Events Interpretation of Probability... 13

The Chi-Square Distributions

Chapter 23: Inferences About Means

9-6. Testing the difference between proportions /20

Unit 4 Probability. Dr Mahmoud Alhussami

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

{ } all possible outcomes of the procedure. There are 8 ways this procedure can happen.

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Swarthmore Honors Exam 2012: Statistics

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Statistical Inference. Section 9.1 Significance Tests: The Basics. Significance Test. The Reasoning of Significance Tests.

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

CHAPTER 7. Hypothesis Testing

Harvard University. Rigorous Research in Engineering Education

1/18/2011. Chapter 6: Probability. Introduction to Probability. Probability Definition

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Comparing Means from Two-Sample

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Lab #12: Exam 3 Review Key

STT 315 Problem Set #3

Chapter 3 Probability Chapter 3 Probability 3-1 Overview 3-2 Fundamentals 3-3 Addition Rule 3-4 Multiplication Rule: Basics

T- test recap. Week 7. One- sample t- test. One- sample t- test 5/13/12. t = x " µ s x. One- sample t- test Paired t- test Independent samples t- test

Chapter 20 Comparing Groups

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

review session gov 2000 gov 2000 () review session 1 / 38

Confidence Intervals and Hypothesis Tests

(right tailed) or minus Z α. (left-tailed). For a two-tailed test the critical Z value is going to be.

Chapter 6. Probability

Mathematical Notation Math Introduction to Applied Statistics

One-sample categorical data: approximate inference

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

20 Hypothesis Testing, Part I

Chapter 26: Comparing Counts (Chi Square)

Statistical Inference for Means

Intro to Stats Lecture 11

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Ch. 7. One sample hypothesis tests for µ and σ

Chapter 7: Hypothesis Testing - Solutions

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Ling 289 Contingency Table Statistics

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Difference between means - t-test /25

STA Module 10 Comparing Two Proportions

CS 246 Review of Proof Techniques and Probability 01/14/19

Population 1 Population 2

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

Transcription:

Chapter Eight Hypothesis Testing with Z and T Introduction to Hypothesis Testing P Values Critical Values Within-Participants Designs Between-Participants Designs

Hypothesis Testing An alternate hypothesis (H 1 ) is a researcher s prediction. It states that the population parameter is different from a given value or from that of another population, and usually states the direction of the difference. A null hypothesis (H 0 ) is the complement of the alternate hypothesis. The tail of a test is the direction in which the alternate hypothesis predicts the results will turn out. Hypothesis testing is using statistical methods and sample data to attempt to reject the null hypothesis. The null hypothesis is either rejected or not rejected. If it is not rejected, this does not mean that it is accepted. As an analogy, consider the difference in meanings in the statements We know there is not life on other planets and We do not have evidence that there is life on other planets. The examples below refer to the question of how meditation affects concentration. Hypothesis Right-tailed Left-Tailed Two-tailed Alternate Meditation increases concentration. Meditation decreases concentration. Meditation affects concentration. Null Meditation does not increase concentration. Meditation does not decrease concentration. Meditation does not affect concentration.

Statistical Significance If the data in a study strongly support the hypothesis, they are said to be statistically significant, and the null hypothesis is rejected. However, since sample data only provide estimates for population parameters, hypothesis testing cannot prove anything. There is always the possibility that a finding, or a lack of finding, is a conicidence in the sample and not representative of the population overall. A type I error is making a false claim, that is, rejecting the null hypothesis when it is actually true. A type II error is not making a claim when one should be made, that is, not rejecting the null when it is actually false. Possibility Reject H 0 Do not reject H 0 H 0 is actually true. Type I error correct H 0 is actually false. correct Type II error Rougly speaking, a type I error is finding something that isn t really there, and a type II error is missing something that is there, as in the example below of a clinical trial for a new drug therapy. Possibility Reject H 0 Do not reject H 0 The drug is not effective. A worthless drug is produced. A worthless drug is not produced. The drug is effective. A valuable drug is produced. A valuable drug is not produced.

Factors leading to statistical significance Three factors lead to large z or t scores and thus increased likelyhood of statistical significance. Factor Significant example Nonsignificant example big difference Boys read an average of 386 words per minute, and girls read an average of 463 words per minute. Boys read an average of 386 words per minute, and girls read an average of 398 words per minute. large sample 80 boys and 74 girls 10 boys and 13 girls low variance s 1 = 54, s 2 = 48 s 1 = 144, s 2 = 148

P Values The P value of a sample statistic is the probability that another sample of the same size would support the the prediction at least as well as this, given the null hypothesis is true. Prediction Outcome P Value Conclusion A particular coin 6 out of 9 flips are ( 9 6)( 1 2) 9 + ( 9 7)( 1 2) 9 + The prediction was correct, but not by lands on heads heads. ( more than tails. 9 8)( 1 2) 9 + ( 9 9)( 1 2) 9 enough to be convincing that the coin is weighted. 6 out of 9 heads could 25.4% easily be a coincidence. A particular coin 9 out of 9 flips are ( 1 2) 9 0.2% This coin seems to be weighted lands on heads heads. towards heads. 9 heads out of 9 could more than tails. be a coincidence, but this is not likely. SAT math scores on average are higher than 500 (σ = 100). The average of 250 random scores is 512. 512 500 z = 100 / 250 = 1.90 P(z > 1.9) = 2.9% The average SAT math score seems to be above 500. Although the sample mean was only slightly higher than 500, the sample was very large, making the difference unlikely to be coincidental. Using the z table and methods from previous chapters, it is easy to calculate the p value for a sample based on its z score. For means, z = σ x / n. µ ^ p p For proportions, z = pq/n.

Common misunderstandings about p values False understanding Correct understanding Non-statistics example A p value is the probability that a type I error was made. P can be calculated using the data that were used to form the hypothesis. A p value is the probability that the null will be rejected, given the null is true. This is not the same as the probability that the null is true, given the null was rejected. P values only have meaning for events that were predicted. It s important not to make a prediction about an event that you know has already happened. The probability that a criminal is a male is very different from the probability that a male is a criminal. Players must call their shots in pool. If a player makes a shot without calling it, it is assumed that it was a coincidence rather than skill, and it doesn t count.

Critical Values If p is less than.05, the null hypothesis is rejected no matter what p is. Therefore, rather than calculating p, it can be sufficient simply to identify the critical value at which p will be.05. For a one-tailed z test, this value is z 0 = 1.64. For a t test, the critical value can be looked up in the t table based on degrees of freedom. The region under the curve beyond the critical value is the critical region and has an area of.05. If the sample statistic has reached the critical value, then it is in the critical region, p is less than.05, the data are statistically significant, and the null hypothesis is rejected. In the example below, since the sample statistic z falls within the critical region, which has an area of.05, the area beyond z must be under.05. z 0 1.64 z

Statistical Significance The level of significance α (alpha) is how low the p value needs to be for the data to be statistically significant, allowing the researcher to reject H 0. In social science research, this is set at α =.05. Each statement below implies the one below it. Therefore, they essentially all mean the same thing. Statement Meaning The sample statistic z is greater than 1.64, or t or another sample statistic is greater than the is greater* than the critical value for that statistic for the given number of degrees of freedom. critical value. (*or less, depending on the tails) The sample statistic is in the critical region. The p value is less than.05. The data are statistically significant. The null hypothesis is rejected. The sample statistic (z, t, etc.) falls within the 5% of the curve that is shaded in the tail(s) past the critical value(s). If the null hypothesis is true, there is less than a 5% chance that another random sample of the same size would turn out as strongly as the current one did. The sample size is large enough, the results are different enough from each other, and there is low enough variance for the researchers to be convinced that their findings are not merely a coincidence in their sample. The researchers claim that their prediction, which turned out true in their sample, is valid for the population overall. However, they may be making a type I error.

Between-Participants and Within-Participants Designs Data collected from a sample can be compared to data from another sample or to data collected again from the same sample but under different conditions. When feasible, within-participants designs are better because they are more powerful. Design Description Example Advantage Between- Participants Within- Participants A different group of participants is used for condition B as was used for condition A. The same group of participants takes part in both conditions. Some participants memorize a word list by reading it, and some participants memorize a word list by hearing it. Every participant memorizes one word list by reading it and another word list by hearing it. Avoids sequence effects: Participants cannot be influenced by their earlier participation. It may be impractical, immoral, or impossible to have each participant take part in each condition. More powerful: Since each participant is being compared against himself or herself, an outlier in one condition will likely be an outlier in the other condition as well, making the difference not an outlier. This makes for lower variance, and thus a more powerful test.

Sequence Effects Sequence effects are effects of one condition of a study on a later condition, such as those below. Sequence Effect Fatigue Practice Boredom Interference Environmental Changes Knowledge about Procedure Example of a dependent variable that could be affected time to run a mile free-throw percentage score on a timed math test number of words remembered driving speeds (if weather changes) whether or not the gorilla was noticed When possible, sequence effects are reduced by counterbalancing, in which some participants take part in condition A first and the others take part in condition B first. Counterbalancing eliminates some confounding variables from a within-participants design. If the the participants are randomly assigned to which condition they will take part in first, then the within-participants design is a true experiment (rather than a quasi-experiment), eliminating additional possible confounding variables.

Two-sample tests Z tests and t tests can be done for one or two samples. Between-participants designs use two-sample tests. Within-participants designs use a one-sample test even though there are two conditions, because only one mean (the mean difference) is being tested. Parameter One-sample test Two-sample test x Mean t = 1 µ t = n s2 1 p p Proportion z = z = pq n The two-sample t-test formula combines s 1 and s 2 into a pooled estimate of σ, and is intended for situations in which there is no reason to assume that σ 1 and σ 2 would be remarkably different. If σ is known, z can be calculated instead of t. (n ( n 1 1 1 1)s + 1 n 2 ) 2 1 + (n 2 1)s 2 2 n 1 + n 2 2 p 1 p 2 p 1q 1 n + 1 p 2 q 2 n 2 x 1 x 2

Calculator tests of means Z tests and t tests of one or two means can be done on the calculator. On calculator One-sample test* Two-sample test Test Z-Test if σ is known T-Test if σ is unknown 2-SampZTest if both σ s are known 2-SampTTest if not Input Enter data in L1 and choose Data, or choose Stats and enter the sample mean, the population mean being tested (for H 0 ), the standard deviation, and the sample size. Enter data in L1 and L2 and choose Data, or choose Stats and enter the sample means, the standard deviations, and the sample sizes. Tails setting µ µ 0 for two-tailed µ<µ 0 for left-tailed µ>µ 0 for right-tailed * including within-participants designs µ 1 µ 2 for two-tailed µ 1 <µ 2 for left-tailed (first sample mean predicted to be lower) µ 1 >µ 2 for right-tailed (first sample mean predicted to be higher)

Calculator tests of proportions Z tests of one or two proportions can be done on the calculator. On calculator One-sample test Two-sample test Test 1-PropZTest 2-PropZTest Input Tails setting Enter p0 for the population proportion being tested, n for the sample size, and x for the number of successes in the sample. prop p0 for two-tailed prop<p0 for left-tailed prop>p0 for right-tailed Enter n1 for the size of the first sample, and x1for the number of successes in that sample. Then do the same for n2 and x2 for the second sample. p1 p2 for two-tailed p1<p2 for left-tailed (first sample proportion predicted to be lower) p1>p2 for right-tailed (first sample proportion predicted to be higher)