Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Similar documents
One-sample categorical data: approximate inference

Harvard University. Rigorous Research in Engineering Education

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Inferences About Two Population Proportions

10.1. Comparing Two Proportions. Section 10.1

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Data Analysis and Statistical Methods Statistics 651

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Difference Between Pair Differences v. 2 Samples

Lecture 11 - Tests of Proportions

Lecture 7: Confidence interval and Normal approximation

Categorical Data Analysis 1

Lecture 10: Comparing two populations: proportions

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Math 140 Introductory Statistics

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Last few slides from last time

MA : Introductory Probability

Business Statistics. Lecture 5: Confidence Intervals

STA Module 10 Comparing Two Proportions

ST 305: Final Exam ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) ( ) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ Y. σ X. σ n.

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Math 140 Introductory Statistics

Stat 101: Lecture 12. Summer 2006

Chapter 10: Comparing Two Populations or Groups

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals

a Sample By:Dr.Hoseyn Falahzadeh 1

Midterm 1 and 2 results

Chapter 10 Regression Analysis

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

ECO220Y Hypothesis Testing: Type I and Type II Errors and Power Readings: Chapter 12,

Confidence intervals

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Are data normally normally distributed?

Inference for Single Proportions and Means T.Scofield

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Estimation and Confidence Intervals

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

POLI 443 Applied Political Research

Inference for Proportions

Probability and Statistics

MATH220 Test 2 Fall Name. Section

Inference for Proportions

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Chapter 10: Comparing Two Populations or Groups

Categorical Data Analysis. The data are often just counts of how many things each category has.

Chapters 3.2 Discrete distributions

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Chapter 10: Comparing Two Populations or Groups

UNIVERSITY OF TORONTO Faculty of Arts and Science

Gov 2000: 6. Hypothesis Testing

Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017

Probability and Probability Distributions. Dr. Mohammed Alahmed

Confidence Intervals with σ unknown

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Probability and Discrete Distributions

Math 143: Introduction to Biostatistics

6 Sample Size Calculations

Data Analysis and Statistical Methods Statistics 651

Point Estimation and Confidence Interval

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Vocabulary: Samples and Populations

Data Analysis and Statistical Methods Statistics 651

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Hypothesis for Means and Proportions

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS IOP 201-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH)

Sample Size Determination

Binary Logistic Regression

Statistical inference

Statistics for Business and Economics

1 Matched pair comparison(p430-)

Confidence intervals CE 311S

Quantitative Analysis and Empirical Methods

Pig organ transplants within 5 years

Probability, Statistics, and Bayes Theorem Session 3

Significance Tests. Review Confidence Intervals. The Gauss Model. Genetics

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

The t-statistic. Student s t Test

and the Sample Mean Random Sample

Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam.

Sampling Distributions: Central Limit Theorem

Probability and Statistics. Terms and concepts

Math 10 - Compilation of Sample Exam Questions + Answers

Interactions and Factorial ANOVA

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Transcription:

What we will do today s David Meredith Department of Mathematics San Francisco State University October 22, 2009 s 1 2 s 3 What is a? Decision support Political decisions s s Goal of statistics: optimize decisions with partial information Example: political party wants to appeal to voters Cannot ask every voter what they think So ask a sample Statistics tells us how closely sample might equal population

Medical innovations Using sample statistics to estimate population parameters s s Example: medical researchers think they have effective drug Conduct careful test with randomization of patients, placebos, etc. Statistics helps plan test, decide number of subjects, etc. Statistics tells us if test provides good evidence of drug s effectiveness Population parameter: percentage of all voters who favor a single-payer health-care system. Sample statistic: percentage of sample of voters who favor a single-payer health-care system. The sample percentage is a point estimate for the population percentage Any sample statistic is a point estimate for the corresponding population parameter Example point estimates The big question s If we measure heights, the mean height of a sample is a point estimate for the mean height of the population The standard deviation of the sample heights is a point estimate for the standard deviation of the heights in the population The proportion of people favoring single-payer in a sample of voters is a point estimate for the proportion of people favoring single-payer in the population s The important questions is: how accurate is a point estimate? The rest of this course will study the question: how accurate? The slope of the regression line through a sample of heights and weights is a point estimate for the slope of the regression line through the heights and weights of the entire population

Three answers for quantitative variable Three answers for categorical variable s s Question: how effective is this medicine? Population: sick people; variable: blood pressure, present or absent (categorical) 1 The medicine lowers blood pressure by about 20 points on average (point estimate) 2 I m highly confident that the medicine lowers blood pressure by between 15 and 25 points on average (confidence interval) 3 I m almost certain the medicine lowers blood pressure (hypothesis test) Question: how effective is this medicine? Population: sick people; variable: disease, present or absent (categorical) 1 The medicine cures about 80% of cases (point estimate) 2 I m highly confident that the medicine cures between 70% and 90% of cases (confidence interval) 3 I m almost certain the medicine cures some people (hypothesis test) Confidence intervals Confidence intervals s Suppose you want to estimate some parameter for a population, like height, percentage type A blood, percentage of Republicans, average lifespan, etc. A sample will give you a point estimate p for the parameter. We do not expect p to be exactly correct. A confidence interval is an interval (a, b) that you are somewhat confident contains the true population parameter. We are somewhat confident that a < π < b where π is the true parameter s Your point estimate p is right in the middle of (a, b). p = a + b 2 a = p e b = p + e You need to learn how to compute the margin of error e for different levels of confidence.

Levels of confidence The key idea s What does it mean to be % confident, 95% confident You are % confident if you think there is a % chance that you are correct (not good). You are 95% confident if you think there is a 95% chance that you are correct (pretty high). In real life you are seldom certain Statistics is a rigorous way of dealing with uncertainty s To calculate a 95% confidence interval for a sample with mean m, standard deviation s and size n: Pretend your sample represents the population perfectly Let x be the sampling distribution for your variable and sample size Sample size is n. Mean of x is m. Standard deviation of x is s n. Find the central range that contains 95% of all possible samples Solve P( x < a) = 0.025 and P( x < b) = 0.975. Then P(a < x < b) = P( x < b) P( x < a) = 0.975 0.025 = 0.95 (a, b) is your 95% confidence interval Example 1 Example 2 s Suppose we wanted a 95% confidence interval for male heights, and we took a random sample of men. The sample average height was 69.2" with a sample standard deviation of 3.1". Related problem: population average is 69.2" and population standard deviation is 3.1". Let x be sampling distribution for heights with sample size. Mean of x is 69.2, standard deviation of x is 3.1. We will find a and b such that P(H < a) = 0.025 and P(H < b) = 0.975. Then P(a < H < b) = P(H < b) P(H < a) = 0.975 0.025 = 0.95 a = qnorm(0.025, 69.2, 3.1/sqrt()) = 68.34 b = qnorm(0.975, 69.2, 3.1/sqrt()) = 70.06) 95% confidence interval is (68.34, 70.06). s Suppose we wanted a 90% confidence interval for male heights, and we took a random sample of men. The sample average height was 69.2" with a sample standard deviation of 3.1". Related problem: population average is 69.2" and population standard deviation is 3.1". Let x be sampling distribution for heights with sample size. Mean of x is 69.2, standard deviation of x is 3.1. We will find a and b such that P(H < a) = 0.05 and P(H < b) = 0.95. Then P(a < H < b) = P(H < b) P(H < a) = 0.95 0.05 = 0.90 a = qnorm(c(0.05,0.95), 69.2, 3.1/sqrt()) = 68.47, 69.92 95% confidence interval is (68.47, 69.92).

s Compare confidence intervals Two confidence intervals for heights; 95% confidence interval: (68.34, 70.06) 90% confidence interval: (68.47, 69.92) Sample average 69.2" is right in the middle of both. 95% margin of error: e = 69.2 68.34 = 70.06 69.2 = 0.86 s Convenient approximation for 95% CI If your sample has mean m and standard deviation s, an approximation frequently used for the 95% confidence interval is (m 2 s n, m + 2 s n ) 90% margin of error: e = 69.2 68.47 = 69.92 69.2 = 0.73 95% interval is wider, because we are more confident of a less precise statement. We are less confident of a more precise statement. In ( previous example, that would ) be 69.2 2 3.1, 69.2 + 2 3.1 = (68.32, 70.08) Actual answer was (68.34, 70.06). Sample sizes Sample sizes s Picking sample size is an important question for researchers at the beginning of a project Too small a sample, and your research might not be significant. Too big a sample, and your research might be too expensive. s Suppose you wanted to estimate a quantitative variable like men s heights with a margin of error of e = 0.1 with 95% confidence. How big a sample do you need? Let n be the sample size and s standard deviation of the sample we will measure. We have to guess or estimate s to find n. We guess s = 3. Sometimes researchers do a small study to estimate s with a small (cheap) sample. Let solve P(Z < z) = 0.975 z = qnorm(0.975,0,1) = 1.96 ( sz ) 2 Then n = = 3457.31 e Minimal sample size is 3458.

Sample sizes, the sequel The key idea s Suppose you wanted to estimate a quantitative variable like men s heights with a margin of error of e = 0.2 with 90% confidence. How big a sample do you need? Let n be the sample size and s standard deviation of the sample we will measure. Assume s = 3. Let solve P(Z < z) = 0.95 z = qnorm(0.95,0,1) = 1.64 ( sz ) 2 Then n = = 608.75 e Minimal sample size is 609. s To calculate a 95% confidence interval for a sample with proportion p and size n: Pretend your sample represents the population perfectly Let x be the sampling distribution for your variable and sample size Sample size is n. Proportion of x is p. p(1 p) Standard deviation of x is. n Find the central range that contains 95% of all possible samples Solve P( x < a) = 0.025 and P( x < b) = 0.975. T (a, b) is your 95% confidence interval s Example 1 Suppose we wanted a 95% confidence interval for voters preferences, and we took a random sample of voters. 61% wanted congress to pass a health plan (CBS radio news, Sunday, October 18, 2009). Related problem: population proportion is 0.61. Let ˆp be sampling distribution for voter preferences with sample size. Mean of ˆp is 0.61, standard deviation 0.61 0.39 of ˆp is = 0.069. We will find a and b such that P(ˆp < a) = 0.025 and P(ˆp < b) = 0.975. Then P(a < ˆp < b) = P(ˆp < b) P(ˆp < a) = 0.975 0.025 = 0.95 a = qnorm(0.025, 0.61, sqrt(.61*.39/)) = 0.47 b = qnorm(0.975, 0.61, sqrt(.61*.39/)) = 0.75) 95% confidence interval is (0.47, 0.75). s Example 2 Suppose we wanted a 90% confidence interval for voters preferences, and we took a random sample of voters. 61% wanted congress to pass a health plan. Related problem: population proportion is 0.61. Let ˆp be sampling distribution for voter preferences with sample size. Mean of ˆp is 0.61, standard deviation 0.61 0.39 of ˆp is = 0.069. We will find a and b such that P(ˆp < a) = 0.05 and P(ˆp < b) = 0.95. Then P(a < ˆp < b) = P(ˆp < b) P(ˆp < a) = 0.95 0.05 = 0.90 a = qnorm(0.05, 0.61, sqrt(.61*.39/)) = 0. b = qnorm(0.95, 0.61, sqrt(.61*.39/)) = 0.72) 95% confidence interval is (0., 0.72).

s Compare confidence intervals Two confidence intervals for heights; 95% confidence interval: (0.47, 0.75) 90% confidence interval: (0., 0.72) Sample proportion 0.61 is right in the middle of both. 95% margin of error: e = 0.61 0.47 = 0.75 0.61 = 0.14 s Convenient approximation for 95% CI If your sample has proportion p an approximation frequently used for the 95% confidence interval is ( ) p(1 p p(1 p p 2, p + 2 n n 90% margin of error: e = 0.61 0. = 0.72 0.61 = 0.11 95% interval is wider, because we are more confident of a less precise statement. We are less confident of a more precise statement. In ( previous example, that would be 0.61 0.39 0.61 2, 0.61 + 2 (0.47, 0.75) Actual answer was the same. 0.61 0.39 ) = Sample sizes Sample sizes s Suppose you wanted to estimate a categorical variable like voter preferences with a margin of error of e = 0.03 with 95% confidence. How big a sample do you need? Let n be the sample size Let solve P(Z < z) = 0.975 z = qnorm(0.975,0,1) = 1.96 ( z ) 2 Then n = = 1067.11 2e Minimal sample size is 1068. Statisticians often replace 1.96 with 2 and calculate the sample size for a 95% CI with margin of error e to be n = 1 e 2. s Suppose you wanted to estimate a categorical variable like voter preferences with a margin of error of e = 0.03 with 90% confidence. How big a sample do you need? Let n be the sample size Let solve P(Z < z) = 0.95 z = qnorm(0.95,0,1) = 1.64 ( z ) 2 Then n = = 747.11 2e Minimal sample size is 748. Margin of error 3% requires sample size 1 0.03 2 = 1111.11.

Why lecture better than textbook s Text often assumes you know standard deviation of population, but only mean of sample. Lecture used both mean and standard deviation derived from sample. More realistic.