Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Similar documents
Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

7 Estimation. 7.1 Population and Sample (P.91-92)

You are allowed 3? sheets of notes and a calculator.

7.1 Basic Properties of Confidence Intervals

Statistics II Lesson 1. Inference on one population. Year 2009/10

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

HT Introduction. P(X i = x i ) = e λ λ x i

Business Statistics. Lecture 5: Confidence Intervals

Correlation and Regression

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Confidence Intervals, Testing and ANOVA Summary

Estimation and Confidence Intervals

Central Limit Theorem ( 5.3)

Mathematical statistics

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Chapter 8 - Statistical intervals for a single sample

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

MS&E 226: Small Data

Statistics: Learning models from data

Are data normally normally distributed?

Statistical Intervals (One sample) (Chs )

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Math 494: Mathematical Statistics

Regression Models - Introduction

Hypothesis testing. Data to decisions

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Lecture 8 Sampling Theory

Representativeness. Sampling and. Department of Government London School of Economics and Political Science

Statistical Inference

This does not cover everything on the final. Look at the posted practice problems for other topics.

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Lecture 2: Statistical Decision Theory (Part I)

Mathematics for Economics MA course

Statistics. Statistics

STAT 285: Fall Semester Final Examination Solutions

Statistics for IT Managers

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Lecture 10: Generalized likelihood ratio test

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Post-exam 2 practice questions 18.05, Spring 2014

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Advanced Herd Management Probabilities and distributions

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

1 MA421 Introduction. Ashis Gangopadhyay. Department of Mathematics and Statistics. Boston University. c Ashis Gangopadhyay

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Introduction to Econometrics. Review of Probability & Statistics

CHAPTER 7. Parameters are numerical descriptive measures for populations.

Mathematical statistics

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Confidence Intervals with σ unknown

Pubh 8482: Sequential Analysis

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Econ 325: Introduction to Empirical Economics

Harvard University. Rigorous Research in Engineering Education

Better Bootstrap Confidence Intervals

Review. December 4 th, Review

Lecture 6: Point Estimation and Large Sample Confidence Intervals. Readings: Sections

Chapters 9. Properties of Point Estimators

SAMPLING BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :55. BIOS Sampling

Simple Linear Regression

Institute of Actuaries of India

MS&E 226: Small Data

IEOR E4703: Monte-Carlo Simulation

multilevel modeling: concepts, applications and interpretations

Chapter 2: Simple Random Sampling and a Brief Review of Probability

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Statistical Inference

Psychology 282 Lecture #4 Outline Inferences in SLR

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Categorical Data Analysis Chapter 3

1 Introduction to Estimation

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Theory of Statistics.

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Topic 12 Overview of Estimation

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

The Central Limit Theorem

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017

STAT100 Elementary Statistics and Probability

Confidence Intervals for Normal Data Spring 2018

MTMS Mathematical Statistics

Topic 19 Extensions on the Likelihood Ratio

Hypothesis testing: theory and methods

Simple logistic regression

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Comparing two independent samples

STAT 512 sp 2018 Summary Sheet

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions)

Transcription:

Interval estimation October 3, 2018 STAT 151 Class 7 Slide 1

Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 1 A probability model for treatment outcome: Outcome Probability 1 (recovers) p 0 (does not recover) 1 p Maximum likelihood estimate of p is ˆp = 60 a point estimate 100 STAT 151 Class 7 Slide 2

Interval estimation MLE ˆp comes from a sample of n = 100 patients Recall any estimate incurs sampling error a defined by point estimate unknown = ˆp p To account for uncertainty due to sampling error, we estimate where a is called margin of error Interval width can be adjusted: p lies within (ˆp a, ˆp + a), a > 0, large a (wider interval) small a (shorter interval) } { more certain p (ˆp ± a) less certain p (ˆp ± a) a Sampling error arises because we use a sample (only a part of the population) to infer about the entire population STAT 151 Class 7 Slide 3

Confidence interval (CI) Our new type of estimate is called a confidence interval estimate a There are two basic components in a confidence interval estimate: (a) Level of confidence a measure of our level of belief (b) Margin of error a measure of the precision of our estimate For the pandemic example, we wish to say something like: we are 95% confident the population proportion p is between 0.6 ± a In that case (a) Level of confidence = 95% (b) Margin of error = a How do we determine a? a Sometimes simply called an interval estimate STAT 151 Class 7 Slide 4

Sampling distribution Some facts: Estimates a from different random samples form a sampling distribution around the unknown p, see gray x-marks in figure ˆp = 0.6, green x-mark from the observed data behaves like the gray x-marks Anything in gray is not observed ˆp = 0.6 green x-mark p =? ˆp from observed sample Estimates from different samples p a Assuming an unbiased or consistent estimator so there is no systematic over- or under-estimation STAT 151 Class 7 Slide 5

Central Limit Theorem (CLT) Let ˆθ be the sample estimate of a population characteristic θ. If ˆθ is obtained using a well behaved estimator and given a sufficiently large sample (of n independent randomly drawn observations), then the sampling distribution of ˆθ is approximately normal with mean θ and variance var(ˆθ) ˆp from observed sample Estimates from different samples p Using the empirical rules: we can be 95% certain that is no more than p ± 2 a var( ) = p ± 2SE(ˆp) a 2 is an approximation; a more exact value is 1.96 STAT 151 Class 7 Slide 6

Large sample 95% confidence interval for p We can be 95% certain that is no more than p ± 2 a SE(ˆp) translates in notations as p 1.96SE(ˆp) < ˆp < p + 1.96SE(ˆp) p 1.96SE(ˆp) p ˆp < ˆp p ˆp < p + 1.96SE(ˆp) p ˆp 1.96SE(ˆp) ˆp < p < 1.96SE(ˆp) ˆp ˆp 1.96SE(ˆp) < p < ˆp + 1.96SE(ˆp) We are 95% certain that p is within ˆp ± 1.96SE(ˆp) The level of confidence is 95% and the margin of error is 1.96SE(ˆp) a The more exact value of 1.96 is used here STAT 151 Class 7 Slide 7

Large sample 95% confidence interval for any quantity Given a sufficiently large random sample (of n independent observations) from a population, let ˆθ be the sample estimate of a population characteristic θ. If ˆθ is obtained from a well behaved estimator, then an approximate 95% confidence interval for θ is given by ˆθ ± 1.96SE(ˆθ) Why 95%? Using the empirical rules: we can be 90% certain that is no more than p ± 1.64SE(ˆp) p ˆp ± 1.64SE(ˆp) is a 90% confidence interval We can form many confidence levels from the same set of data Every study should have one conclusion. A meaningful interval should have: (a) a high level of confidence (b) a width that is not too wide Due to (a) and (b), we often use a 95% confidence interval STAT 151 Class 7 Slide 8

Interpretation of a confidence level A confidence interval (CI) is a method for finding a plausible range for p. Each time a CI is calculated using a random sample, we obtain a different interval. For example, a 95 % CI has the following property: If the method is used repeatedly, then 95% of the intervals will actually include p. However, each time a 95% CI is calculated, the chance that p is included in that particular interval is NOT 95% it is either { 0% (p not inside CI, wrong estimate!) 100% (p inside CI, correct estimate!). Therefore, our confidence in our interval is based on the fact that it may be one of the 95 (out of 100) that actually includes the unknown. STAT 151 Class 7 Slide 9

Large sample 95% confidence interval for a population mean If X is a point estimate µ, an approximate 95% confidence interval is: ˆµ ± 1.96SE(ˆµ) X ± 1.96SE( X ) The interval estimate can be completed by working out SE( X ) between samples {}}{ ( ) X1 +... + X n var( X ) = var n = 1 n 2 var(x 1 +... + X n ) = 1 n 2 [var(x 1) +... + var(x n )] }{{} X 1,...,X n are independent = 1 n 2 n var(x ) }{{} var(x 1)=...=var(X n) var(x ) var(x ) = }{{ n } depends on var(x ) and n { (1) SD(X ), how different are the values of X in the population SE( X ) = var( X ) depends on (2) n, the sample size STAT 151 Class 7 Slide 10

Woman s wage data example Suppose we wish to estimate µ = mean hours of work for all working white married women in the US in 1975-1976 Available data: n (X X s = i X ) 2 i=1 n 1 n 1303 776.2744 428 Since SD(X ) is unknown, an approximate 95% confidence interval for µ is Using the data gives X ± 1.96 s n 1303 ± 1.96 776.2744 428 1303 ± 74 = (1229, 1377) In the approximate 95% confidence interval, 1229 and 1377 hours are, respectively, the lower and upper confidence limits; the margin of error is 74. STAT 151 Class 7 Slide 11

What is a proportion? Pandemic example (2) p is the proportion in the population of N patients who would recover { 1 recovers X = 0 not recover Suppose the value of X in the population are X 1 = 1 (recovers), X 2 = 0 (not recover), X 3 = 0,...,X N = 1, which is a collection of 1 s and 0 s p = #1 s N = 1 + 0 + 0 +... + 1 N = X 1 + X 2 + X 3 +... + X N N = µ Hence a proportion is a special case of µ with only 1 s and 0 s STAT 151 Class 7 Slide 12

Sampling to estimate a proportion Pandemic example (3) We take a sample X 1,..., X n and estimate p µ using X ˆp = X 1 +... + X n n X 1,..., X n are: { 1 with probability p 0 with probability 1 p An approximate 95% confidence for p as a special case of µ is ˆp ± 1.96SD(ˆp) X ± 1.96SE( X ) X ± 1.96 SD(X ) n var(x ) = E(X 2 ) E(X ) 2 = (1) 2 p + (0) 2 (1 p) = p p 2 = p(1 p) p 2 {}}{ µ 2 Hence, an approximate 95% confidence interval for p is: p(1 p) ˆp ± 1.96 n STAT 151 Class 7 Slide 13

Pandemic example (4) Available data: ˆp = X ˆp(1 ˆp s = n n 60/100=0.6 0.6(1 0.6)/100 100 Since p(1 p) is unknown, an approximate 95% confidence interval for p is ˆp(1 ˆp) ˆp ± 1.96 n which, using the data, gives 0.6(0.4) 0.6 ± 1.96 0.6 ± 0.096 = (0.504, 0.696). 100 STAT 151 Class 7 Slide 14

A normal population mean Interval estimates that rely on the CLT require large sample size n No general expression for small n, except when estimating a normal population mean µ and SD(X ) is known, when the following is still valid X ± 1.96SE( X ) = X ± 1.96 SD(X ) n SD(X ) is often unknown and replaced by s to give: where t 1.96 X ± t s n t stretches the interval to compensate for the extra uncertainty in a poor estimate of SD(X ) by s when n is small The amount of stretching depends on n, a small n requires more stretching STAT 151 Class 7 Slide 15

Example Suppose we wish to estimate average household expenditure in a population, with available data n (X X s = i X ) 2 i=1 n 1 n 1924.9 223.1021 10 Since n is small, we assume household expenditure follows a normal distribution To find an approximate 95% confidence interval, we need to use a t value that depends on the degree of freedom (df ), defined as df = n 1. df = n 1 6 7 8 9 10 20 120 >120 value 2.447 2.365 2.306 2.262 2.228 2.086 1.98 1.96 In this example, n = 10, which gives df = 10 1 = 9; so we choose the value 2.262 in the table to replace 1.96, giving 1924.9 ± 2.262 223.1021 10 1924.9 ± 159.5 = (1765.4, 2084.4) STAT 151 Class 7 Slide 16