Lecture 10: Comparing two populations: proportions

Similar documents
Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Sampling Distributions: Central Limit Theorem

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Population 1 Population 2

Chapter 9 Inferences from Two Samples

Chapter 3. Comparing two populations

1 Statistical inference for a population mean

Chapter 20 Comparing Groups

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Midterm 1 and 2 results

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Chapter. Hypothesis Testing with Two Samples. Copyright 2015, 2012, and 2009 Pearson Education, Inc. 1

Hypothesis testing: Steps

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

Two-Sample Inferential Statistics

Hypothesis testing: Steps

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Harvard University. Rigorous Research in Engineering Education

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

their contents. If the sample mean is 15.2 oz. and the sample standard deviation is 0.50 oz., find the 95% confidence interval of the true mean.

Solution: First note that the power function of the test is given as follows,

Introduction to Business Statistics QM 220 Chapter 12

1 Hypothesis testing for a single mean

5 Basic Steps in Any Hypothesis Test

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

POLI 443 Applied Political Research

Problem Set 4 - Solutions

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Chapter 12: Inference about One Population

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Comparing Means from Two-Sample

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Two Sample Problems. Two sample problems

1)I have 4 red pens, 3 purple pens and 1 green pen that I use for grading. If I randomly choose a pen,

Hypotheses Testing. 1-Single Mean

Difference Between Pair Differences v. 2 Samples

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Point Estimation and Confidence Interval

Percentage point z /2

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

10.1. Comparing Two Proportions. Section 10.1

1; (f) H 0 : = 55 db, H 1 : < 55.

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

CHAPTER 10 HYPOTHESIS TESTING WITH TWO SAMPLES

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Lab #12: Exam 3 Review Key

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis

COGS 14B: INTRODUCTION TO STATISTICAL ANALYSIS

The t-statistic. Student s t Test

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Section 9 1B: Using Confidence Intervals to Estimate the Difference ( p 1 p 2 ) in 2 Population Proportions p 1 and p 2 using Two Independent Samples

9/28/2013. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

Single Sample Means. SOCY601 Alan Neustadtl

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Basic Statistics and Probability Chapter 9: Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses

Mathematical Notation Math Introduction to Applied Statistics

Chapter 6 Continuous Probability Distributions

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

The independent-means t-test:

[ z = 1.48 ; accept H 0 ]

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Math 10 - Compilation of Sample Exam Questions + Answers

Chapter 8: Confidence Interval Estimation: Further Topics

Inference for Distributions Inference for the Mean of a Population

Chapter 22. Comparing Two Proportions 1 /29

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

STAT Chapter 8: Hypothesis Tests

Econ 325: Introduction to Empirical Economics

STA Module 10 Comparing Two Proportions

Module 17: Two-Sample t-tests, with equal variances for the two populations

Practice Questions: Statistics W1111, Fall Solutions

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Comparing two independent samples

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Probability and Statistics Notes

Chapter 22. Comparing Two Proportions 1 /30

Statistics for IT Managers

Final Exam Review (Math 1342)

Statistics for Business and Economics: Confidence Intervals for Proportions

INTERVAL ESTIMATION OF THE DIFFERENCE BETWEEN TWO POPULATION PARAMETERS

68% 95% 99.7% x x 1 σ. x 1 2σ. x 1 3σ. Find a normal probability

Transcription:

Lecture 10: Comparing two populations: proportions Problem: Compare two sets of sample data: e.g. is the proportion of As in this semester 152 the same as last Fall? Methods: Extend the methods introduced for situations involving one sample to the new situation with two samples. We will learn how to use two sample proportions for: constructing a confidence interval estimate of the difference between the corresponding population proportions, and testing a claim made about the two population proportions. Data requirements We have sample proportions from two independent simple random samples. For each of the two samples, the number of successes is at least 5 and the number of failures is at least 5.

COMPARING PROPORTIONS IN LARGE SAMPLES Examples: Compare probability of H on two coins. Compare proportions of republicans in two cities. 2 populations: p1=proportion of S (successes) in population 1, p2=proportion of S in population 2. GOAL: Determine if p1=p2 based on two samples. Perform two Binomial experiments (one in each population) 1 ST sample: x successes in m ind. trials, get sample prop. of S: ; 2 nd sample: y successes in n ind. trials, get sample prop. of S:. p ˆ1 pˆ 2 x = m y = n Test Ho: p1= p2 vs Ha: p1 p2 or Ha: p1> p2 or Ha: p1< p2

TESTING HYPOTHESES PROCEDURE Test on significance level α. STEP1. Ho: p1= p2 vs Ha: p1 p2 or Ha: p1> p2 or Ha: p1< p2 STEP 2. Test statistic: where, is the pooled or combined proportion Under the Ho, the test statistic has standard normal distribution for large samples. STEP 3. Critical value? For one-sided test z α, for two-sided z α/2. STEP 4. DECISION-critical/rejection region(s) depends on Ha. Ha: p1 p2 Reject Ho if z > z α/2 ; Ha: p1 > p2 Reject Ho if z > z α ; Ha: p1 < p2 Reject Ho if z < - z α. pˆ ˆ 1 p2, 1 1 pˆ (1 pˆ )( + ) m n STEP 5. Answer the question in the problem. z = x + y ˆp pˆ =. m + n

EXAMPLE A sample of 180 college graduates was surveyed. 100 of them men and 80 women, and each was asked if they make more or less than $40,000 per year. The following data was obtained. $40,000 < $40,000 Total Men: 60 40 100 Women: 30 50 80 Total 90 90 180 Are men more likely to make more than $40,000 than women? Use α=0.05. Soln. Let p1 = true proportion of men making over $40k; p2 = true proportion of women making over $40k;

EXAMPLE, contd. STEP1. Ho: p1= p2 vs Ha: p1> p2 60 30 60 + 30 pˆ 1 = = 0.6, pˆ ˆ 2 = = 0.375, p =. 100 80 100 + 80 STEP 2. Test statistic: z pˆ pˆ 0.6 0.375 1 2 = = = 1 1 1 1 pˆ (1 pˆ )( + ) 0.5(0.5)( + ) m n 100 80 3. STEP 3. Critical value= z α =z 0.05 =1.645. STEP 4. DECISION. z = 3 > 1.645, reject Ho. STEP 5. Men are more likely than women to make over $40k.

EXAMPLE contd. Find the p-value for the test P-value = P(Z>z) = P(Z>3) = 0.0013 Since the p-value is smaller than the significance level, we reject Ho.

Example: For the sample data listed in the Table below, use a 0.05 significance level to test the claim that the proportion of black drivers stopped by the police is greater than the proportion of white drivers who are stopped. Soln. Let p1 = true proportion of white drivers stopped; p2 = true proportion of black drivers stopped;

EXAMPLE, contd. STEP1. Ho: p1= p2 vs Ha: p1< p2 147 24 147 + 24 pˆ 1 = = 0.105, pˆ ˆ 2 = = 0.120, p = = 0.1069. 1400 200 1400 + 200 STEP 2. Test statistic: z pˆ pˆ 0.105 0.120 1 2 = = = 1 1 1 1 pˆ (1 pˆ )( + ) 0.1069(0.8931)( + ) m n 1400 200 STEP 3. Critical value= z α =z 0.05 = - 1.645. STEP 4. DECISION. z = -0.64 > -1.645, do not reject Ho. 0.64. STEP 5. Black men are not more likely to be stopped than white men by the police.

EXAMPLE contd. Find the p-value for the test P-value = P(Z < z) = P(Z < -0.64) = 0.2611 The p-value =0.2611 > 0.05 (significance level), so we do not reject Ho.

Independent and dependent samples Two samples are independent if the sample values selected from one population are not related to or somehow paired or matched with the sample values selected from the other population. Examples: weights of students in different univ., test results of students in different towns, yields on different fields, etc. Two samples are dependent (or consist of matched pairs) if the members of one sample can be used to determine the members of the other sample. Examples: Test results for students before and after a study session, weight of a group of people before and after a weight loss program, predicted and true max temps for several days in a given month in Reno, etc.

COMPARING MEANS: INDEPENDENT SAMPLES 1 ST sample: x1, x2,, x m from population with mean µx; 2 nd sample: y1, y2,, y n from population with mean µy; GOAL: Determine if µx = µy based on the two samples. Test Ho: µx = µy vs Ha: µx µy or Ha: µx > µy or Ha: µx < µy Procedure depends on what we can assume about variability of the populations: σx and σy. CASE1. σx and σy are known. CASE2. σx and σy are not known, but may be assumed equal σx=σy CASE3. σx and σy are not known, and can not be assumed equal. Test statistics are developed for each of the 3 cases.

COMPARING MEANS: INDEPENDENT SAMPLES CASE 1: σx and σy known Test on significance level α. STEP1. Ho: µx = µy vs Ha: µx µy or Ha: µx > µy STEP 2. Test statistic: Under the Ho, the test statistic has standard normal distribution. STEP 3. Critical value? For one-sided test z α, for two-sided z α/2. STEP 4. DECISION-critical/rejection region(s) depends on Ha. Ha: µ µo Reject Ho if z > z α/2 ; Ha: µ > µo Reject Ho if z > z α ; Ha: µ < µo Reject Ho if z < - z α. STEP 5. Answer the question in the problem. z = x σ m 2 x y σ + n 2 y.

COMPARING MEANS: INDEPENDENT SAMPLES CASE 2: σx and σy not known, but assumed equal. STEP 2. Test statistic: 2 s p where is a pooled estimate of the common variance Under the Ho, the test statistic has t distribution with df = m+n-2. STEP 3. Critical value? One-sided test t α, two-sided t α/2. STEP 4. DECISION-critical/rejection region(s) depends on Ha. Ha: µ µo Reject Ho if t > t α/2 ; Ha: µ > µo Reject Ho if t > t α ; Ha: µ < µo Reject Ho if t < - t α. t = s p x y 1 1 + m n 2 1 { 2 2 s = ( m 1) s + ( n 1) s }. p m + n 2 x y,

COMPARING MEANS: INDEPENDENT SAMPLES CASE 3: σx and σy not known, and may not be assumed equal. STEP 2. Test statistic: t = x Under Ho, the degrees of freedom for the t distribution may be approximated by df=min(m-1, n-1) (i.e. smaller of m-1 and n-1). 2 sx + m y s 2 y n. STEP 3. Critical value? One-sided test t α, two-sided t α/2. STEP 4. DECISION-critical/rejection region(s) depends on Ha. Ha: µ µo Reject Ho if t > t α/2 ; Ha: µ > µo Reject Ho if t > t α ; Ha: µ < µo Reject Ho if t < - t α.

EXAMPLE1 A medication for blood pressure was administered to a group of 13 randomly selected patients with elevated blood pressure while a group of 15 was given a placebo. At the end of 3 months, the following data was obtained on their Systolic Blood Pressure. Control group, x: n=15, sample mean = 180, s=50 Treated group, y: m=13, sample mean =150, s=30. Test if the treatment has been effective. Assume the variances are the same in both groups and use α=0.01. Soln. Let µx= mean blood pressure for the control group; µy= mean blood pressure for the treatment group. x Then, n=15, = 180, s x =50, m=13, =150, s y =30. Assumed equality of variances/st.dev. σx=σy y

EXAMPLE1 contd. STEP1. Ho: µx = µy (medicine not effective) vs Ha: µx > µy (med. effective) STEP 2. Pooled variance: 2 2 2 2 ( m 1) s ( 1) 2 x + n sy (15 1)50 + (13 1)30 s p = = = 1761.54. m + n 2 15 + 13 2 Standard deviation Test statistic: s p = s = 1761.54 = 41.97 2 p t x y 180 150 = = = 1.8863. 1 1 1 1 sp + 41.97 + m n 15 13 STEP 3. Critical value=t 0.01 =2.479, df=26. STEP 4. t=1.8863 not > 2.479, do not reject Ho. STEP 5. Not enough evidence to conclude that the medicine is effective.

Example 2. Sample statistics are shown for the distances of the home runs hit in record-setting seasons by Mark McGwire and Barry Bonds. Use a 0.05 significance level to test the claim that the distances come from populations with different means. McGwire Bonds n 70 73 x 418.5 403.7 s 45.5 30.6 Soln. Let µx= mean distance for McGwire; µy= mean distance for Bonds. CASE3. σx and σy are not known, and can not be assumed equal.

EXAMPLE2 contd. STEP1. Ho: µx = µy (same mean distances) vs Ha: µx µy (different mean distances) Test statistic: t x y 418.5 403.7 = = = 2 2 2 2 s sy 45.5 30.6 x + + m n 70 73 2.273. STEP 3. Critical value= t 0.025 = 1.994, df=69 (min(69, 72)). STEP 4. t=2.273 > 1.994, reject Ho. STEP 5. There is enough evidence to conclude that the mean distances of the home runs for the two players are different.

Independent and dependent samples Recall: Two samples are independent if the sample values selected from one population are not related to or somehow paired or matched with the sample values selected from the other population. Two samples are dependent (or consist of matched pairs) if the members of one sample can be used to determine the members of the other sample.

PAIRED t-test: comparing dependent samples Observations come as matched pairs (X,Y). X and Y are NOT independent, X and Y are dependent. Examples. X is score on a test before studying hard; Y is score on the test after studying hard for the same student; X is score on a test or in sports before training program, Y score after training program; X is weight before weight loss program, Y is weight after the program; X and Y are heights of twins or siblings.

PAIRED t-test: HYPOTHESES Hypotheses of interest: does training make a difference? µx = score before training; µy = score after training. Ho: µx = µy vs Ha: µx < µy (no difference) (score after training is higher) Data are pairs of observations: (x1, y1), (x2, y2),, (xn, yn). Typically, we work with differences: d=x-y, phrase hypotheses in terms of differences: µd = true mean difference. In terms of differences: Hypotheses e.g. Ho: µd = 0 vs Ha: µd < 0 Data: d1, d2,, dn. obs before after difference 1 x1 y1 d1=x1-y1 2 x2 y2 d2=x2=y2.... n xn yn dn=xn-yn

PAIRED t-test: TEST PROCEDURE To test Ho, we do one sample t-test. Need sample mean and standard deviation of d s: Compute the test statistic: n 1 d = di s = n n 2 ( di d ) and 2 i= 1 d. Under Ho the test statistic has t(n-1) distribution. i= 1 t n 1 Make decision in exactly the same way as for the one sample t- test. = s d d / n.

PAIRED t-test: an example The amount of lactic acid in the blood was examined for 10 men, before and after a strenuous exercise, with the results in the following table. (a) Test if exercise changes the level of lactic acid in blood. Use significance level α=0.01. (b) Find a 95% CI for the mean change in the blood lactose level. Before 15 16 13 13 17 20 13 16 14 18 After 33 20 30 35 40 37 18 26 21 19

PAIRED t-test: lactic acid example contd. Solution. Take d= After level before level of lactic acid. Data for d: 18, 4, 17, 22, 23, 17, 5, 10, 7, 1. Sample stats: STEP1. Ho: µd = 0 vs Ha: µd 0 STEP 2. Test statistic: d = s d = 2 12.4 and 63.156. d 12.4 t = = = 4.93. s / n 7.95 / 10 STEP 3. Critical value? df=n-1=9, t α/2 =t 0.005 =3.25. d STEP 4. DECISION: t = 4.93 > 3.25 = t 0.005, so reject Ho. STEP 5. There is enough evidence to conclude that exercise changes lactic acid level.

Example 2: Are Forecast Temperatures Accurate? The following Table consists of five actual low temperatures and the corresponding low temperatures that were predicted five days earlier. Use a 0.05 significance level to test the claim that there is a difference between the actual low temperatures and the low temperatures that were forecast five days earlier.

Example 2: contd. Computed from the data: = 13.2, s d = 10.7, n = 5 µ d = mean daily difference between the predicted and the observed min temperatures. H 0 : µ d = 0 H 1 : µ d 0 Step 2. Test statistic: d d 13.2 t = = = 2.759. s / n 10.7 / 5 d STEP 3. Critical value? df=n-1=4, t α/2 =t 0.025 =2.776. STEP 4. DECISION: t = -2.759 > -2.776, so do not reject Ho. STEP 5. There is no significant difference between the mean predicted and observed min daily temperatures.