If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Similar documents
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Common Large/Small Sample Tests 1/55

Properties and Hypothesis Testing

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Data Analysis and Statistical Methods Statistics 651

Chapter 5: Hypothesis testing

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

LESSON 20: HYPOTHESIS TESTING

Frequentist Inference

Chapter 8: Estimating with Confidence

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

1036: Probability & Statistics

6 Sample Size Calculations

Hypothesis Testing (2) Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Statistics 511 Additional Materials

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Chapter 6 Sampling Distributions

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

A statistical method to determine sample size to estimate characteristic value of soil parameters

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Thomas Whitham Form Centre

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

1 Inferential Methods for Correlation and Regression Analysis

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

This is an introductory course in Analysis of Variance and Design of Experiments.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Topic 9: Sampling Distributions of Estimators

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Economics Spring 2015

MA238 Assignment 4 Solutions (part a)

Statistical inference: example 1. Inferential Statistics

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Sampling Distributions, Z-Tests, Power

Estimation of a population proportion March 23,

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Expectation and Variance of a random variable

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Chapter 6 Principles of Data Reduction

Chapter 13, Part A Analysis of Variance and Experimental Design

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Chapter 22: What is a Test of Significance?

Power and Type II Error

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

(7 One- and Two-Sample Estimation Problem )

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Math 140 Introductory Statistics

Topic 18: Composite Hypotheses

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Estimation for Complete Data

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

University of California, Los Angeles Department of Statistics. Hypothesis testing

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Lecture 2: Monte Carlo Simulation

Final Examination Solutions 17/6/2010

1 Review of Probability & Statistics

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Confidence Intervals for the Population Proportion p

Last Lecture. Wald Test

6.3 Testing Series With Positive Terms

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

5. Likelihood Ratio Tests

(6) Fundamental Sampling Distribution and Data Discription

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

Random Variables, Sampling and Estimation

Chapter two: Hypothesis testing

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Topic 9: Sampling Distributions of Estimators

1 Models for Matched Pairs

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Chapter 23: Inferences About Means

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Chapter 4 Tests of Hypothesis

Topic 9: Sampling Distributions of Estimators

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Sample Size Determination (Two or More Samples)

Mathematical Notation Math Introduction to Applied Statistics

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Statistics. Chapter 10 Two-Sample Tests. Copyright 2013 Pearson Education, Inc. publishing as Prentice Hall. Chap 10-1

STAT431 Review. X = n. n )

A Confidence Interval for μ

Transcription:

STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially have to verify whether a populatio parameter could be equal to a certai proposed value. Sice othig is kow about the parameter, iformatio has to be gathered from the populatio by selectig a sample which is as ubiased as possible. This sample should, as far as possible, cotai most, if ot all, the characteristics of its paret populatio. I other words, it must be the best possible represetative of the populatio from which it comes. From samplig theory, we lear that the most appropriate method of samplig depeds etirely o the structure of the populatio. We will assume that, to the best of our ability, we choose the most ubiased sample. It is worth metioig, however, that it is very hard, if ot impossible, to obtai the ideal sample because we ca ever elimiate samplig errors completely. ONE-SAMPLE TESTING If, for istace, we were required to test whether the populatio mea μ could be equal to a certai value μ, it would be quite atural to select a sample ad determie the value of the poit estimate of the populatio mea, that is, the sample mea x, so as to have a approximate idea of the value of the populatio mea (read the chapter o Estimatio. The whole problem the boils dow to checkig how far x is from μ. Far is subjective, that is, if μ 5, the someoe may fid that x 47 is far from 5 but someoe else may fid that 47 is relatively close to 5. We have to remember that if x 47, we caot automatically coclude that the populatio mea ca defiitely ot be equal to 5 just because 47 is ot umerically equal to 5. There exist samplig errors which could have caused the sample mea to deviate from the true value of the populatio mea. The factor which determies how far or ear x is from μ is the sigificace level α of the test. Before goig ito the details of the problem, let us first become familiar with some terms ad their respective otatios. Whe we have to test whether a parameter could be equal to a proposed value, the proposal is formulated as a hypothesis kow as the ull hypothesis deoted by. For example, if we have to test whether the populatio mea is equal to 5, we would write : μ 5. Thus, a ull hypothesis is just a formal statemet where the parameter is equated to the proposed value.

I ay testig procedure, the priciple is to assume that the ull hypothesis is true. O the basis of iformatio obtaied from a sample, we shall later o decide to accept or reject it. I the case of rejectio, it is importat that we give a more precise aswer tha the populatio mea is ot equal to 5. Statistically speakig, oe will be more satisfied if the aswer is the populatio mea is less tha (or more tha 5 i the case whe is rejected. This is the reaso why every ull hypothesis should always be accompaied by a alterative hypothesis, deoted by. It must be esured that ad be mutually exclusive so that the acceptace of oe implies the automatic rejectio of the other. To every ull hypothesis, there exist three possible alteratives. For example, if : μ 5, the. : μ < 5. : μ > 5 3. : μ 5 are the possible alteratives. The choice of the correct alterative is made accordig to the formulatio of the problem. The sigificace level α of the test is also kow as the critical regio or the regio of rejectio of ad its locatio depeds o the choice of. I most cases, we select large samples for the sake of accuracy i our estimatio of the populatio parameter. Cosequetly, we may make a extesive use of the ormal distributio theory as postulated by the Cetral Limit Theorem. I the above example, alteratives ( ad ( are kow as oe-sided or oetailed alteratives whereas the third oe is called a two-sided or a two-tailed alterative. The term tailed obviously comes from the lower ad upper tails of the ormal distributio. We ow show how the locatio of the critical regio fluctuates with the choice of the alterative hypothesis. : μ : μ < 5 5 α μ Fig..

: μ 5 : μ 5 > α μ Fig.. : μ 5 : μ 5 α α μ Fig..3 I the followig diagram, the acceptace ad rejectio regios are show for a oe-tailed alterative to the right. Note that the critical value is foud at the boudary of these two regios. For a two-tailed alterative, there will be two critical regios ad hece two critical values. α (critical regio or regio of rejectio of Accept μ 5 : μ 5 : μ 5 > Fig..4 critical value 3

. Testig procedure The followig steps may be used as a guidelie durig ay testig procedure. owever, we have to bear i mid that differet sample statistics are required whe testig for differet populatio parameters.. Formulate the ull ad alterative hypotheses. Depedig o the alterative hypothesis ad the sigificace level, defie the 3. Perform the test-statistic, icludig ay other relevat calculatios 4. Compare the test-statistic value with the critical value(s i order to decide whether to accept or reject the ull hypothesis. Write dow a coclusio i the cotext of the problem. The test-statistic varies accordig to the parameter for which we are testig. I geeral, it is iterestig to kow that, wheever we use the z-test, that is, the ormal distributio, the test-statistic is of the form z X E[ X ] var[ X ] where X is the ubiased poit estimator of the parameter uder ivestigatio.. Testig for the populatio mea the z-test Whe testig for the populatio mea μ, we first have to check whether the populatio variace is kow. This is because the test-statistic depeds o this vital factor. If is kow, the, o matter how large the sample size is, we use the z-test. If is ukow, we have to take the sample size ito cosideratio. If is large (greater tha 3, we still use the z-test. The flowchart i Fig... below illustrates the procedure o how to use the appropriate test-statistic for a give situatio i oe-sample mea testig. Is kow? Y x μ z N Is large? Y x μ z ˆ Fig... 4

Example (Populatio variace kow The legth of strigs i the balls of strig made by a particular maufacturer has mea μ m ad variace 7.4 m. The maufacturer claims that μ 3. A radom sample of balls of strig is take ad the sample mea is foud to be 99. m. Test whether this provides sigificat evidece, at the 3% level, that the maufacturer s claim overstates the value of μ. Solutio The claim is that the populatio mea is 3. Thus, we start by formulatig our ull hypothesis accordig to the maufacturer s claim. Now, we have to check whether this is a overstatemet, that is, whether the true value of the mea is i fact less tha what is stated i the claim. ece, we are testig : μ 3 agaist : μ 3 < This is a oe-tailed alterative to the left ad the sigificace level is 3%. Sice the populatio variace is kow, we will use the z-test..3.88.58 3 Fig... The diagram above shows the critical regio (shaded uder the ull hypothesis that the populatio mea is 3. The critical z-value is.88 as obtaied from the stadard ormal table. We kow that the sample size is ad that the sample mea x is 99. m. By usig the test-statistic x μ 99. 3 z, we obtai z. 58. 7.4 Sice.58 >.88 (idicated by a ecircled cross i Fig..., we accept ad coclude that the maufacturer was ot overstatig the value of the populatio mea legth of strig. 5

Example (Populatio variace ukow large sample A supermarket maager ivestigated the legths of time that customers spet shoppig i the store. The time, x miutes, spet by each of a radom sample of 5 customers was measured, ad it was foud that x 87 ad x 69. Test, at the 5% level of sigificace, the hypothesis that the mea time spet shoppig by customers is miutes agaist the alterative that it is less tha this. Solutio The statemet is that the populatio mea is miutes ad we have bee give the alterative hypothesis that the mea is less tha miutes. ece, we are testig : μ agaist : μ < This is a oe-tailed alterative to the left ad the sigificace level is 5%. Sice the populatio variace is ukow, we have to check the size of the sample. The value of is 5, which is statistically cosidered to be large ( > 3, so that will use the z-test..5.84.645 Fig...3 The diagram above shows the critical regio (shaded uder the ull hypothesis that the populatio mea is. The critical z-value is.645 as obtaied from the stadard ormal table. We start by calculatig the values of x ad ˆ as required i the teststatistic sice, this time, the populatio variace is ukow. From formulae, 87 x 9.4 ad 5 69 (9.4 ˆ 34. 8. 5 49 5 6

x μ 9.4 Usig the test-statistic z, we obtai z. 84. ˆ 34.8 5 Sice.84 <.645, (idicated by a ecircled cross i Fig...3, we reject ad coclude that the mea time spet shoppig by customers is less tha miutes..3 Testig for the populatio proportio We ofte wat to kow the proportio of idividuals i a populatio which satisfy a certai characteristic. For example, it would be iterestig to kow the percetage of left-haded people i Mauritius or the proportio of books i a library which cotai more tha 5 pages. As usual, it will be assumed that the populatio is ifiite so that iformatio may oly be obtaied by selectig a sample. The populatio proportio is deoted by p. I geeral, whe we select idividuals, they either satisfy or do ot satisfy the characteristic uder ivestigatio. If it ever happes that a idividual falls i both categories simultaeously (for example, someoe ambidextrous, the that idividual is automatically discarded for the sake of calculatios. It is thus quite atural to use the biomial distributio because each idividual will either be labelled as success or failure, depedig o whether it satisfies the characteristic or ot. If we wat to have a idea of the value of p, we select a sample of size ad cout the umber, x, of idividuals satisfyig the required characteristic. We have leart from the chapter o Estimatio that the sample proportio x p ( p is a ubiased estimator of the populatio p ad has variace. By the p( p Cetral Limit Theorem, for large samples, ~ N p,. I this course, we will ot cosider testig for oe-sample proportios by meas of the biomial or Poisso distributios but rather their approximatios by the ormal distributio (without cotiuity correctio. The test-statistic to be used will therefore be z p p( p The testig procedure will be idetical to that used for testig for a oesample mea except, precisely, for the test-statistic. 7

Example I a public opiio poll, radomly chose electors were asked whether they would vote for the Purple Party at the ext electio ad 357 replied Yes. The leader of the Purple Party believes that the true percetage of electors who would vote for his party is.4. Test at the 8% level whether he is overestimatig his support. Solutio The leader s belief is formulated as the ull hypothesis : p. 4. Sice we wat to kow whether he is exaggeratig, we have to check if, i fact, the percetage of the populatio supportig his party is less tha 4%. Thus, the alterative hypothesis is : p. 4. < Sice the sample size is large (, we use the ormal approximatio to the biomial distributio with p.4 4 ad variace p ( p.4.6 4. Furthermore, the sample proportio 357 p ˆ.357..8.743.46.4 Fig..3. p.357.4 The statistic value is z.743. p( p (.4(.6 Sice.743 <.46, reject ad coclude that the Purple Party leader was ideed overestimatig his support. 8

3 TWO-SAMPLE TESTING The approach to two-sample hypothesis testig is idetical to its oesample couterpart except that, i this case, there is o proposed value. Compariso is made directly betwee the values of two populatio parameters - i geeral, we test for equality betwee these parameters (meas or proportios. Oe very importat aspect to be cosidered from the statisticia s poit of view is that, wheever we test for the differece betwee two populatio meas, it is compulsory to test for equality of the populatio variaces first. This is because the choice of the test-statistic depeds o the fact that the variaces are equal or ot. This coditio, however, is ot applicable whe we test for equality of populatio proportios. 3. TESTING FOR EQUALITY BETWEEN MEANS It could sometimes prove to be essetial to compare the meas of two distributios before makig a importat decisio. For example, we might wish to verify whether the mea lifespa of wome is loger tha that of me i geeral. Otherwise, we may be tempted to check whether there has bee a improvemet i the umber of marks of studets who have bee through a itesive traiig for a certai period. As will be see later, differet types of testig, ad hece, test-statistics, would be used for these two cases. But first, let us examie each case i detail ad the illustrate it by meas of a example. Large idepedet samples Whe testig for equality of meas for two idepedet populatios, we start by selectig a sample from each populatio. For the purpose of this course, we will assume that the variaces of the two populatios are equal. It is the just a matter of testig whether the differece betwee the two sample meas is statistically sigificat. If the samples are large ( > 3, the, accordig to the Cetral Limit Theorem, we may use the ormal distributio theory oce more i determiig the test statistic to be used. If X ad X are two idepedet ormal variables such that X ~ N( μ, ad X ~ N( μ,, the, usig the laws of expectatio ad variace, the differece betwee these variables, X X, will also follow a ormal distributio with expectatio ad variace calculated as follows: E [ X μ μ var[. X ] E[ X ] E[ X ] X X ] var[ X ] var[ X ] 9

We thus write ( X X ~ N( μ μ,. Sice the respective sample meas x ad x, accordig to the Cetral Limit Theorem, are ormally distributed such that X ~ N μ, ad X ~ N μ,, followig the same procedure as i oe-sample testig, we deduce that the test-statistic should be ( x z μ ( x ( x x ( μ μ μ where ad are the sample sizes (ot ecessarily of the same size from each populatio respectively. If the populatio variaces ad are ukow, we replace them by their respective ubiased estimates ˆ ad ˆ. Example A compay has two regioal head offices i Machester ad Glasgow. Workers i the Glasgow office claim that they are paid less tha the workers i the Machester office. To test this claim, a researcher takes a radom sample of workers from each office. The followig set of data is recorded: Machester Glasgow Sample size Mea salary 5 7 5 Stadard deviatio Table 3.. Usig a 5% level of sigificace, test the claim that the Glasgow workers are paid lower salaries o average. Solutio Let us deote Machester ad Glasgow as populatios ad respectively, hece the subscripts for meas, variaces ad sample sizes. We formulate the ull ad alterative hypotheses as follows: μ μ : μ μ ( ( μ μ > agaist : μ > μ

Sice the sample sizes ( are statistically cosidered as large, we use the ormal distributio. The critical value correspodig to a sigificace level of.5 is.645 from the stadard ormal table. ****************************************************************** Note that, sice the populatio variaces are ukow, we shall replace them by s their ubiased estimators. It is iterestig to ote that ˆ is equivalet to ˆ s. This relatioship is very useful i the sese that we o more have to compute the ubiased estimates but ca use the sample stadard deviatios themselves directly i the test-statistic. ****************************************************************** Accept.5 Fig. 3...645.4 Usig the iformatio give i Table 3.., the test-statistic value is ( x z x ( μ μ ( x x ( μ μ s s ( ( (57 5 ( ( 99 ( 99.4 Sice.4 >.645 (see Fig. 3.., we reject ad coclude that Machester workers ideed get paid higher salaries tha their Glasgow couterparts.

3. TESTING FOR EQUALITY BETWEEN PROPORTIONS We recall that, for a oe-sample test for the populatio proportio p, its ubiased estimator (sample proportio followed a ormal distributio with mea p ( p p ad variace accordig to the Cetral Limit Theorem. If we exted this theory to two-sample testig, that is, whe testig for equality of two populatio proportios p ad p, a sample will be selected from each populatio ad its sample proportio determied. Assumig that samples of sizes ad are chose from populatios ad ad that x ad x observatios are foud that satisfy the characteristic uder ivestigatio from the respective populatios, the the sample proportios will be x x p ˆ ad p ˆ. Thus, p p( p ˆ ~ N p, ad p ( p p ˆ ~ N p, accordig to the Cetral Limit Theorem. Sice the liear combiatio of two ormal distributios is also a ormal distributio, we determie the distributio of ˆ as follows: ( p E[ ˆ p ] p p ad p( p var[ ˆ p ] p ( p Therefore, ( p ˆ ~ N p p, p( p p ( p The test-statistic for two-sample testig for equality for proportios is z ( ( p ˆ ( p p ( ( ˆ ( p ( sice. The ull hypothesis is give by p p ( p p :. The values of p ad p are ukow, hece estimatios beig used for the variaces.

Example To verify the percetages of me ad wome who are IV positive i a certai commuity, two samples were selected ad the followig iformatio was recorded: Me Wome Sample size 55 7 Number of IV positive 86 396 Table 3.. Ca we coclude, at the 5% sigificace level, that the proportios of IV positive me ad wome i the commuity are equal? Solutio Sice we are oly testig whether the proportios are equal, there is o specific directio, that is, we use a two-tailed alterative hypothesis. : p p ( p p p p ( p p : Deotig the me ad wome populatios by ad respectively, the recorded data may be summarised as 55 x 86 86. 5 p ˆ 55 7 396 396. 55 p ˆ 7.5.5.96.6.96 Fig. 3.. 3

The test-statistic value is z ( ( ˆ p ( (.5.55 (.5(.48 55 (.55(.45 7.6 Sice.96 <.6 <.96 (see Fig. 3.. above, we do ot have eough evidece to reject. We coclude that the proportios of IV positive me ad wome i that commuity are equal. 4