Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Similar documents
Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Chapter 8 - Statistical intervals for a single sample

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Introduction to Survey Analysis!

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Business Statistics. Lecture 10: Course Review

This does not cover everything on the final. Look at the posted practice problems for other topics.

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Lecture 11 - Tests of Proportions

UNIVERSITY OF TORONTO Faculty of Arts and Science

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Sampling Distribution: Week 6

EXAM 3 Math 1342 Elementary Statistics 6-7

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

The Multinomial Model

Exam 2 (KEY) July 20, 2009

1 Binomial Probability [15 points]

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Introduction to Statistical Data Analysis III

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

STAT 4385 Topic 01: Introduction & Review

STAT Exam Jam Solutions. Contents

Ch. 7. One sample hypothesis tests for µ and σ

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Chapter 9 Inferences from Two Samples

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Mr. Stein s Words of Wisdom

Mathematical Notation Math Introduction to Applied Statistics

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

16.400/453J Human Factors Engineering. Design of Experiments II

1 Matched pair comparison(p430-)

Econ 325: Introduction to Empirical Economics

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

2011 Pearson Education, Inc

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Confidence Intervals for Population Mean

Design of Engineering Experiments

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Confidence Intervals with σ unknown

Formulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

CENTRAL LIMIT THEOREM (CLT)

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Stat 231 Exam 2 Fall 2013

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

18.05 Practice Final Exam

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion

Econometrics. 4) Statistical inference

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Chapter 23: Inferences About Means

Table 1: Fish Biomass data set on 26 streams

Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

STA 101 Final Review

Confidence intervals CE 311S

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Estimation and Confidence Intervals

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

STAT100 Elementary Statistics and Probability

Harvard University. Rigorous Research in Engineering Education

How do we compare the relative performance among competing models?

STAT Chapter 8: Hypothesis Tests

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Population 1 Population 2

7.1 Basic Properties of Confidence Intervals

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Inference for Single Proportions and Means T.Scofield

Stat 427/527: Advanced Data Analysis I

Visual interpretation with normal approximation

Probability Theory and Statistics. Peter Jochumzen

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

Chapter 5: HYPOTHESIS TESTING

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

The Chi-Square Distributions

Summary of Chapters 7-9

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Hypothesis testing: theory and methods

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Chapter 15 Sampling Distribution Models

7 Estimation. 7.1 Population and Sample (P.91-92)

Advanced Experimental Design

Sociology 6Z03 Review II

Hypothesis testing. Data to decisions

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Chapter 26: Comparing Counts (Chi Square)

MATH 728 Homework 3. Oleksandr Pavlenko

Final Exam - Solutions

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

CH.8 Statistical Intervals for a Single Sample

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Review of Statistics 101

Transcription:

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing ECO22Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 22, 217 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 1 / 35 Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing Key Concepts: 1 Definitions (Sample Proportions, Sample Means, Standard Error, Confidence Intervals, Hypothesis Testing, Margin of Error, p-values) 2 Tables/Plots (Normal Critical Values, Student-t Critical Values, χ 2 -Critical Values, Confidence Regions vs. Rejection Regions) 3 Test Statistics/Intervals (One Proportion Intervals, One Mean Intervals, Difference of Proportions/Means) 4 Ideas (Central Limit Theorem, Type I & Type II Errors, Level of Confidence/Significance, Power of Tests, Setting Sample Size) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 2 / 35

What are Sampling Distributions? Sampling Distribution The distribution of proportions or means over many independent samples from the same population A distribution of sample distributions Sampling Error The sampling variability from one sample to another The larger the sample size the smaller the sampling error Dr. Nick Zammit (UofT) Topic 3 November 22, 217 3 / 35 Empirical Laws Central Limit Theorem Whatever the distribution of X, as the number of terms in the sum becomes large, the distribution of X tends to a normal distribution Applies regardless of whether the underlying distribution is continuous and symmetric like the uniform distribution, continuous and asymmetric like the chi- squared distribution, or even discrete such as the Binomial distribution. CLT implies random samples can be taken from a population and the distribution of sample statistics is normally distributed Dr. Nick Zammit (UofT) Topic 3 November 22, 217 4 / 35

The Central Limit Theorem Example Random Dice Rolls (5 Obs) Mean Dice Rolls (5 obs, 5 means) 2 1 12 23 34 45 56 6Dice Roll 5 51 15 Percent 3.3 3.4 3.5 3.6 3.7 3.8 E(X) Dice Rolls (5 Obs) Random Random Dice Rolls (5 Obs) 2 2 Random Dice Rolls (5 Obs) 5 Percent 1 15 1 2 3 4 Dice Roll 5 6 5 Percent 1 15 1 2 3 4 Dice Roll 5 6 2 Random Dice Rolls (5 Obs) Mean Dice Rolls (5 obs, 5 means) Percent 5 1 15 15 1 2 3 4 Dice Roll 5 6 5 Percent 1 3.3 3.4 3.5 E(X) 3.6 3.7 3.8 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 5 / 35 Characteristics of Sampling Distributions Sampling Distributions for Proportions Given several assumptions/conditions are satisfied then the sampling distribution of ˆp is modelled by a Normal distribution with µ(ˆp) = p and SD(ˆp) = pq n Sampling Distributions for Means Given several assumptions/conditions are satisfied then the sampling distribution of x is modelled by a Normal distribution with µ( x) = x and σ( x) = SD( x) = σ n Dr. Nick Zammit (UofT) Topic 3 November 22, 217 6 / 35

Standard Error Standard Error of Proportions An estimate of standard deviation for a sampling distribution For a sample proportion ˆp the standard error is SE(ˆp) = ˆp ˆq n Standard Error of Mean An estimate of standard deviation for a sampling distribution For a sample mean x the standard error is SE( x) = s n Dr. Nick Zammit (UofT) Topic 3 November 22, 217 7 / 35 Assumptions and Conditions for Normality Assumptions 1 Independence Assumption The sampled values must be independent of each other Conditions 1 Randomization Condition The data values must be sampled randomly 2 The 1% Condition The sample n should be no more tha% of the population 3 Large-Sample Condition If the underlying population distribution is not unimodal and symmetric the sample size should be large (n 5) If the underlying population distribution is unimodal and symmetric the sample size can be small (n < 5) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 8 / 35

Graphical Checks for Normality (Large-Sample Condition) The Normal Probability Plot Scatter plot of X values on the vertical axis against Normal Scores on the horizontal axis Deviations from a straight line indicate non-normality Normal Scores Solve for Order Statistic Medians (OSMs): U i = (i a) (n+1 2a) for i = 1, 2,..., n where a = 3/8 if n 1 and a =.5 if n > 1 Normal Scores are inverse normal function values of OSMs: N i = G(U i ) where G(X ) is the inverse of normal cdf Dr. Nick Zammit (UofT) Topic 3 November 22, 217 9 / 35 Graphical Checks for Normality 4 5 6 7 8 9 grade -2-1 1 12 2Normal Score Normal Probability Plot of Student Grades Normal Probability Plot of Student Grades 4 5 6 grade 7 8 9-2 -1 Normal Score 1 2 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 1 / 35

What is a Confidence Interval? Confidence Interval Provides a range of likely values for the true but unknown population parameter (such as a proportion or mean) Given some margin of error based on the level of confidence determines what guesses of the true parameter are likely Confidence Intervals take the form: Estimate ± Margin of Error (ME) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 11 / 35 How is Margin of Error Determined? Margin of Error Margin of Error (ME) = Standard Error (SE) Critical Value (CV) Critical Value Choose a level of confidence corresponding to a probability value from the sampling distribution Use the distribution (ex. Normal, Student-t, Chi-Squared, etc...) to look up the CV from the inverse CDF This can be done using a table or Stata Dr. Nick Zammit (UofT) Topic 3 November 22, 217 12 / 35

Critical Values for Common Confidence Levels Confidence Interval Critical Values Level of Confidence Critical Value (c = 1 α) Zc.75 1.15.8 1.28.85 1.44.9 1.645.95 1.96.98 2.33.99 2.58 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 13 / 35 Estimating Confidence Intervals for One Parameter One Proportion z-interval ˆp ± Zc SE(ˆp) ˆp ˆq where SE(ˆp) = n and n ˆp > 1 nˆq > 1 One Mean z-interval x ± Z c SE( x) where SE( x) = s n and independence, randomization, 1% condition hold Dr. Nick Zammit (UofT) Topic 3 November 22, 217 14 / 35

How to set sample size? Deciding Sample Size For a one proportion z-interval: ( ) ˆp ˆq ME = Zc n = n ( ) 1 ME 2 (Zc ) 2 (ˆp ˆq) Setting Sample Size For a one mean z-interval: ( ) s ME = Zc n n = ( ) 1 ME 2 (Zc ) 2 ( s 2) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 15 / 35 Estimating Confidence Intervals for Finite Samples One Mean t-interval x ± t c SE( x) where SE( x) = s n, tc = t1 α,n 1 and the independence, randomization, and 1% condition all hold When do you use t instead of z? With small (any finite?) sample confidence intervals for the mean use t-distribution not standard normal When s is estimated by SE( x) then use t-distribution Dr. Nick Zammit (UofT) Topic 3 November 22, 217 16 / 35

Confidence Interval for Difference in Two Parameters Difference in Two Proportion z-interval ˆp 1 ˆp 2 ± Z c SE(ˆp 1 ˆp 2 ) where SE(ˆp 1 ˆp 2 ) = SE(ˆp 1 ) 2 + SE(ˆp 2 ) 2 = ˆp1 ˆq 1 + ˆp 2 ˆq 2 Difference in Two Means t-interval x 1 x 2 ± t c SE( x 1 x 2 ) where SE( x 1 x 2 ) = SE( x 1 ) 2 + SE( x 2 ) 2 = s 2 1 + s2 1 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 17 / 35 What is Hypothesis Testing? Hypothesis Testing Assesses the validity of a hypothesis (called the null hypothesis H ) about an unknown population parameter (θ) Requires the formulation of an alternative hypothesis (H A ) against which to test the null hypothesis Forces the investigator to choose a level of significance based on the trade-off between type I & type II errors Dr. Nick Zammit (UofT) Topic 3 November 22, 217 18 / 35

Approach to Hypothesis Testing The Five (Actual) Steps of Hypothesis Testing Plan Do 1 Hypothesis (formulate H and H A ) 2 Level of Significance (choose α) 3 Assumptions (check for independence, randomization, 1% condition) 4 Data (summarize it) 5 Statistical test (calculate test stat or associated p-value) Report 6 Statistical significance (report how significant) 7 Conclusion (reject the null or fail to reject the null) 8 Implications (interpret the conclusion) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 19 / 35 How to formulate a Hypothesis (Step 1) H vs. H A The null hypothesis (H ) is the maintained hypothesis assumed to be true until proven otherwise The hypothesis (H A ) is a composite hypothesis (encompassing many outcomes) that must be true if the null is not Two Sided Test: One Sided Test: H : θ = θ H A : θ θ H : θ θ H : θ θ H A : θ > θ H A : θ < θ Dr. Nick Zammit (UofT) Topic 3 November 22, 217 2 / 35

How to choose α? (Step 2) Level of Significance? The tolerance you are willing to accept for an incorrect null hypothesis to be accepted The flip side to your level of confidence The probability of making a type I error Error Types Dr. Nick Zammit (UofT) Topic 3 November 22, 217 21 / 35 Data to Summarize? (Step 3) Data for Hypothesis Testing 1 Do you know the population distribution? Lets assume yes 2 Do you know the population variance? Lets assume yes 3 What is the Critical Value? Get this from appropriate distribution Standard Normal Critical Values Level of Significance One Sided CV Two Sided CV (α) Zα Zα/2.1 1.282 1.645.5 1.645 1.96.1 2.326 2.58 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 22 / 35

Calculate Test Statistic (Step 4) What is the test statistic? Assume: H : µ µ and H A : µ > µ where X N(µ, σ 2 /n) Pr( X > x) = Pr ( Z > x µ ) σ/ Test Stat = Z = x µ n σ/ n Assume: H : µ = µ and H A : µ µ where X N(µ, σ 2 /n) Pr( X = x) = Pr ( Z = x µ ) σ/ Test Stat = Z = x µ n σ/ n Note: we are assuming we know σ 2 or we need to use s 2 and t-dist Dr. Nick Zammit (UofT) Topic 3 November 22, 217 23 / 35 How do we conclude? (Step 5) Critical Value Approach Take our test statistic calculated in step 4 (Z = x µ σ/ n ) Take the critical value found in step 3 (Z α or Z α/2 ) Reject H if Z > Z Fail to reject H if Z Z Dr. Nick Zammit (UofT) Topic 3 November 22, 217 24 / 35

How do we conclude? (Step 5 - Alt) P-Value Approach Compare probability value associated with test statistic with level of significance instead of critical value Reject H if p-value α Fail to reject H if p-value > α Flow diagram summarizing p-value method for t-test: Dr. Nick Zammit (UofT) Topic 3 November 22, 217 25 / 35 How effective is our test? Calculating power of a test If we knew the true population mean we could calculate the exact power of our test If we don t know the true population mean we can still discuss the relative power of tests Power of a test will increase if: 1 The mean of the test statistic is farther from the true mean 2 The significance level of the test increases 3 We correctly calculate one sided test instead of two sided test 4 We increase our sample size Dr. Nick Zammit (UofT) Topic 3 November 22, 217 26 / 35

Lets return to step 3 Alternative situations in Hypothesis Testing 1 Do you know the population distribution? If NO invoke CLT 2 Do you know the population variance? If NO calculate SE( x) and use t-dist 3 Do you know either of the above? If NO invoke CLT twice, calculate SE( x) and use t-dist Dr. Nick Zammit (UofT) Topic 3 November 22, 217 27 / 35 When do I need a t-dist? If you know σ 2 then you should use normal If you do not know σ 2 then you should use t If you have a large sample and don t know σ 2 you can approximate t-dist. with normal Normal compared with t-dist. a Pr(X > a) b Pr(X > a) t-dist(1 df) 2.326.21 1.645.65 t-dist(2 df) 2.326.16 1.645.58 t-dist(3 df) 2.326.14 1.645.55 t-dist(5 df) 2.326.12 1.645.53 t-dist(1 df) 2.326.11 1.645.52 Normal 2.326.1 1.645.5 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 28 / 35

Test for the difference in proportions/means Formal Steps Confirm independence of samples If samples independent and known σ 2 proceed with appropriate difference test (equal or unequal variances) If samples independent but unknown σ 2 test σ 2 1 = σ2 2 If σ 2 1 = σ 2 2 use difference test with equal variances If σ 2 1 σ 2 2 use difference test with unequal variances If samples are not independent use difference test for matched pairs Dr. Nick Zammit (UofT) Topic 3 November 22, 217 29 / 35 Difference independent means (known unequal variances) What is the test statistic? Assume: H : µ 1 = µ 2 = and H A : µ 1 = µ 2 Pr( X 1 X 2 = x 1 x 2 ) = Pr where X 1 X 2 N(, σ2 1 + σ2 2 ) n 2 Z = ( x 1 x 2 ) σ 2 1 + σ2 2 n 2 Test Stat = Z = ( x 1 x 2 ) σ 2 1 + σ2 2 n 2 Reject H if Z > Z α/2 Dr. Nick Zammit (UofT) Topic 3 November 22, 217 3 / 35

Testing variance of a distribution What is the test statistic? Assume: H : σ 2 σ 2 and H A : σ 2 > σ 2 where X i N(µ, σ 2 ) and (n 1)s2 X σ 2 χ 2 n 1 Test Stat = χ 2 = (n 1)s2 X σ 2 Reject H if χ 2 > χ 2 α,(n 1) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 31 / 35 Difference independent means (unknown equal variances) What is the test statistic? Assume: H : µ 1 = µ 2 = and H A : µ 1 = µ 2 where X 1 X 2 t(, s 2 p ( 1 + 1 n 2 DoF: v = ( + n 2 2) and s 2 p = ( 1)s 2 1 + (n 2 1)s 2 2 + n 2 2 ) ) Test Stat = t = ( x 1 x 2 ) s p 1 + 1 n 2 Reject H if t > t α/2,(v) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 32 / 35

Difference independent means (unknown unequal variances) What is the test statistic? Assume: H : µ 1 = µ 2 = and H A : µ 1 = µ 2 where X 1 X 2 t(, s2 1 + s2 2 n 2 ) and DoF: v = [(S 1 2/) + (S2 2/n 2)] 2 (S1 2/) 2 1 + (S2 2 /n 2) 2 n 2 1 Test Stat = t = ( x 1 x 2 ) s 2 1 + s2 2 n 2 Reject H if t > t α/2,(v) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 33 / 35 Testing the difference of matched pair sample means How does this test work? Consider n to be the number of pairs of d = X 1 X 2 Calculate the mean of the difference d = Calculate the standard deviation s d = n d i i=1 n n (d i d) 2 i=1 n 1 Choose null ex. H : d = and H A : d Calculate a test stat for the difference t = d s d / n Reject H if t > t α/2,(n 1) Dr. Nick Zammit (UofT) Topic 3 November 22, 217 34 / 35

Supplementary References Dr. Nick Zammit (UofT) Topic 3 November 22, 217 35 / 35