The Difference in Proportions Test

Similar documents
T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

Chapter 7 Comparison of two independent samples

Sampling Distributions: Central Limit Theorem

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

The Empirical Rule, z-scores, and the Rare Event Approach

Hypothesis testing. Data to decisions

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

CHAPTER 9: HYPOTHESIS TESTING

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

16.3 One-Way ANOVA: The Procedure

Chapter 23: Inferences About Means

Table Probabilities and Independence

Psych 230. Psychological Measurement and Statistics

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Frequency Distribution Cross-Tabulation

Analysis of 2x2 Cross-Over Designs using T-Tests

An inferential procedure to use sample data to understand a population Procedures

T test for two Independent Samples. Raja, BSc.N, DCHN, RN Nursing Instructor Acknowledgement: Ms. Saima Hirani June 07, 2016

Hypothesis testing: Steps

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Inference for Distributions Inference for the Mean of a Population

ST505/S697R: Fall Homework 2 Solution.

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Two sample Hypothesis tests in R.

Rama Nada. -Ensherah Mokheemer. 1 P a g e

HYPOTHESIS TESTING. Hypothesis Testing

Ch. 7: Estimates and Sample Sizes

Factorial Analysis of Variance

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

A discussion on multiple regression models

Business Statistics. Lecture 10: Course Review

Statistics for IT Managers

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

16.400/453J Human Factors Engineering. Design of Experiments II

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

MBA 605, Business Analytics Donald D. Conant, Ph.D. Master of Business Administration

ANOVA - analysis of variance - used to compare the means of several populations.

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Soc3811 Second Midterm Exam

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Hypothesis testing: Steps

Chapter 10: Chi-Square and F Distributions

Multiple Comparisons

9/28/2013. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Two Sample Problems. Two sample problems

REVIEW 8/2/2017 陈芳华东师大英语系

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Binary Logistic Regression

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Relating Graph to Matlab

Testing Research and Statistical Hypotheses

Introduction to Econometrics. Review of Probability & Statistics

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

Statistical Methods in Natural Resources Management ESRM 304

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Quantitative Analysis and Empirical Methods

Chapter 26: Comparing Counts (Chi Square)

Probability and Probability Distributions. Dr. Mohammed Alahmed

Chapter 23. Inference About Means

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

The t-statistic. Student s t Test

Two-Sample Inferential Statistics

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

STA 101 Final Review

Confidence intervals

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Name: Exam 2 Solutions. March 13, 2017

The Chi-Square Distributions

Introduction to hypothesis testing

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

The Chi-Square Distributions

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Inferences About Two Population Proportions

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

One-Sample and Two-Sample Means Tests

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

Visual interpretation with normal approximation

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

In ANOVA the response variable is numerical and the explanatory variables are categorical.

a Sample By:Dr.Hoseyn Falahzadeh 1

Acknowledge error Smaller samples, less spread

Transcription:

Overview The Difference in Proportions Test Dr Tom Ilvento Department of Food and Resource Economics A Difference of Proportions test is based on large sample only Same strategy as for the mean We calculate the difference in the two sample proportions Establish the sampling distribution for our estimator Calculate a standard error of this sampling distribution Conduct a test 2 Decision Table for Two Proportions Difference of two proportions Use this table to help in the Difference of Proportions Test Large sample problems require that the combination of p*n or q*n > 5 Targets Assumptions Test Statistic H0: p1 - p2 = D Independent Random Samples; Large sample sizes (n1 and n2 >50 when p or q >.10; Ho: D = 0 ndependent Random Samples; Large sample sizes (n1 and n2 >50 when p or q >.10; Ho: D! 0 standard normal for comparisons with z; Pool the estimate of P based on the Null Hypothesis standard normal for comparisons with z; 3 For the null hypothesis (P1-P2 = Do Most often, Do = 0 Since this is a large sample problem, we could use the sample estimates of p1 and p2 to estimate the standard error This is the standard error that we would use for a Confidence Interval for a difference of two proportions s ( p1 " p 2 = p 1 q 1 n 1 + p 2 q 2 n 2 %C.I. = ( p 1 " p 2 ± z # / 2 s ( p1 " p 2 4

When Do = 0 When Do = 0 Under the Null Hypothesis where P1 P2 = 0 The Null doesn t specify what P is, so we don t know! exactly But the Null Hypothesis does tell us the proportions are equal, and thus the variances are equal Thus it would be best to use information from both sample to estimate a single pooled variance We use a pooled average, Pp, based on adding the total number of successes and divide by the sum of the two sample sizes where x = # of successes for each group P p = (x 1 + x 2 (n 1 + n 2 5 In the case where the variances are assumed to be equal 1. We calculate the pooled proportion based on information from two samples 2. And then we use that pooled proportion in the calculation of the Standard Error P p = (x 1 + x 2 (n 1 + n 2 # 1 s ( p1 " p 2 = P p Q p + 1 & % ( $ n 1 n 2 ' Note: this doesn t apply if Do! 0, or when we calculate a Confidence Interval 6 Gender Differences in Shopping for Clothing A survey was taken of 500 people to determine various information concerning consumer behavior. One question asked, Do you enjoy shopping for clothing? The data are shown below by sex. For women 224 out of 260 indicated Yes For men, 136 out of 240 indicated yes Conduct a test that a higher proportion of women favor shopping using " =.01 Start by establishing what you know from the problem pw = 224/260 =.8615 pm = 136/240 =.5667 pw pm = 0 so we pool the variance Pp = (224+136/(260+240 = 360/500 =.72 Qp = (1-.72 =.28 And the Standard Error? SE = [.72*.28*(1/260+1/240].5 =.0402 7 8

Gender Differences in Shopping for Clothing Say it in words PW - PM = 0 PW - PM > 0 1-tailed test large samples; sigma known under Ho; pool z* = (.8615-.5667 0/.0402! =.01, z = 2.33 z* = 7.335 z* > z 7.335 > 2.33 Reject Ho: PW - PM = 0 There is a difference! 9 I conducted a test to determine if women indicated they enjoyed shopping for clothing more than men did. Because the Null Hypothesis was that the two groups are equal, I used a pooled estimate of the proportion who enjoyed shopping for clothing to calculate a pooled variance for the standard error of the test. For women, the proportion was.862 and for men it was.567 The results of my test indicate a highly significant difference between women and men, p <.001. I have evidence to suggest that women enjoy shopping for clothing more than men. 10 The Excel file, DifMeans.xls You enter values wherever it is red It also does difference of means tests Difference of Proportion Test Group 1 Group2 X 224 136 Total 260 240 Ho: Difference 0 p 0.862 0.567 q 0.138 0.433 Values in red need to be entered for the problem If we make no assumption about p1 and p2 being equal If we assume p1 and p2 being equal pooled p 0.7200 pooled q 0.2800 Standard Error for test 0.0385 Standard Error for test 0.0402 Hypothesis Test Difference 0.2949 Hypothesis Test Difference 0.2949 Std Error 0.0385 Std Error 0.0402 Confidence Interval 95 % Level z* = 7.660 z* = 7.337 1-tailed 0.0000 1-tailed 0.0000 2-tailed 0.0000 2-tailed 0.0000 Z-value 1.960 BOE 0.075 0.219 to 0.370 11 Try a Difference of Proportions Problem Geneticists have identified the E2F1 transcription factor as an important component of cell proliferation control. The researchers induced DNA synthesis in two batches of serum-starved cells. In one group of 92 cells (treatment, cells were microinjected with the E2F1 gene. A control group of 158 cells was not exposed to E2F1. After 30 hours, researchers determined the number of altered growth cells in each batch - use this to calculate a proportion. Test to see if the E2F1 treated cells had a higher proportion of altered cells that the control group. Use " =.01 12

What calculations do you need to do? Altered Not Altered Row Total E2F1 41 51 92 Control 15 143 158 Column Total 56 194 250 pe = 41/92 =.446 condition row proportion for E2F1 pc = 15/158 =.095 conditional row proportion for Control pe - pc = 0 pool for variance Pp = (41+15/250 = 56/250 =.224 row marginal Qp =.776 row marginal SE = [.224*.776(1/92 + 1/158.5 =.0547 13 Altered Growth Cells 14 Altered Growth Cells DifMeans.xls results Pe - Pc = 0 Pe - Pc > 0 1-tailed test large samples; sigma known under Ho; pool z* = (.446-.095 0/.0547! =.01, z = 2.33 z* = 6.414 z* > z 6.414 > 2.33 Reject Ho: Pe - Pc = 0 E2F1 has a higher proportion 15 Difference of Proportion Test Be sure you can find the information you need to conduct a test Group 1 Group2 X 41 15 Total 92 158 Ho: Difference 0 p 0.446 0.095 q 0.554 0.905 Values in red need to be entered for the problem If we make no assumption about p1 and p2 being equal If we assume p1 and p2 being equal pooled p 0.2240 pooled q 0.7760 Standard Error for test 0.0568 Standard Error for test 0.0547 Hypothesis Test Difference 0.3507 Hypothesis Test Difference 0.3507 Std Error 0.0568 Std Error 0.0547 Confidence Interval 99 % Level z* = 6.172 z* = 6.414 1-tailed 0.0000 1-tailed 0.0000 2-tailed 0.0000 2-tailed 0.0000 Z-value 2.576 BOE 0.146 0.204 to 0.497 16

Paired Difference Test Paired Difference Test When you have a situation where we record a pre and post test for the same individual, we cannot treat the samples as independent In these cases we can do a Paired Difference Test. It s called Matched Pairs Test in some books. Or Mean Difference in others The strategy is relatively simple We create a new variable which is the difference of the pre-test from the post test This new variable can be thought as a single random sample For this new variable we calculate sample estimates of the mean and standard deviation 17 We calculate C.I. or conduct a hypothesis test on this new variable Often times the mean difference is referred to as meand And the hypothesis is often: µd = 0 This is no different than any single mean test, large sample or small sample 18 Sample data set for Paired Difference Test Alzheimer s study: Paired Difference Test Patient Time 1 Time 2 Difference t1 - t2 1 5 5 0 2 1 3-2 3 0 0 0 4 1 1 0 5 0 1-1 6 2 1 1 19 Twenty Alzheimer patients were asked to spell 24 homophone pairs given in random order Homophones are words that have the same pronunciation as another word with a different meaning and different spelling Examples: nun and none; doe and dough The number of confusions were recorded The test was repeated one year later The researchers posed the following question: Do Alzheimer s patients show a significant increase in mean homophone confusion over time? Use an alpha value of.05 20

Here are the sample statistics Paired Difference Test This can be thought of as a repeated measures study The sample size is 20 Statistics Time 1 Time 2 Difference Mean 4.15 5.80 1.65 Standard Error 0.78 0.94 0.72 Median 5.00 5.50 1.00 Mode 5.00 3.00 0.00 Standard Deviation 3.50 4.21 3.20 Sample Variance 12.24 17.75 10.24 Kurtosis -0.85 0.13 0.08 Skewness 0.41 0.64 0.48 Range 11 16 12 Minimum 0 0-3 Maximum 11 16 9 Sum 83 116 33 Count 20 20 20 21 22 Paired Difference Test Difference of Means and Proportion tests µd = 0 µd > 0 1-tailed test small sample; sigma unknown, use t t* = (1.65 0/.72! =.05, t.05, 19 df = 1.729 t* = 2.29 t* > t 2.29 > 1.729 Reject Ho: µd = 0 We find support for an increase 23 First - a difference or means or proportions is an extension of the basic hypothesis test It may look more complicated But it is the same thing! Begin by confirming it involves two random samples Then decide if it is two means or two proportions If two means can you assume equal variances? If proportions is the Null Hypothesis that the two groups are equal? We often rely on software to do these problems 24