Basic Concepts of Inference

Similar documents
Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

F79SM STATISTICAL METHODS

Practice Problems Section Problems

Inferences for Proportions and Count Data

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Mathematical Statistics

8.1-4 Test of Hypotheses Based on a Single Sample

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Primer on statistics:

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Statistical inference

MTMS Mathematical Statistics

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics

Psychology 282 Lecture #4 Outline Inferences in SLR

14.30 Introduction to Statistical Methods in Economics Spring 2009

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Hypothesis testing: theory and methods

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Ch. 5 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

STAT 830 Hypothesis Testing

Topic 15: Simple Hypotheses

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

A Very Brief Summary of Statistical Inference, and Examples

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Introductory Econometrics

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Central Limit Theorem ( 5.3)

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

simple if it completely specifies the density of x

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Lectures 5 & 6: Hypothesis Testing

Review. December 4 th, Review

Chapter 8 of Devore , H 1 :

Chapter 5: HYPOTHESIS TESTING

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation


STAT Chapter 8: Hypothesis Tests

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Econ 325: Introduction to Empirical Economics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Statistical Process Control (contd... )

STA Module 10 Comparing Two Proportions

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

CSE 312 Final Review: Section AA

Institute of Actuaries of India

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

Tests about a population mean

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

STAT 830 Hypothesis Testing

Chapter Three. Hypothesis Testing

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

E. Santovetti lesson 4 Maximum likelihood Interval estimation

1 Statistical inference for a population mean

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

MIT Spring 2016

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Introductory Econometrics. Review of statistics (Part II: Inference)

BTRY 4090: Spring 2009 Theory of Statistics

Hypothesis Testing Chap 10p460

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Topic 17 Simple Hypotheses

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

A Very Brief Summary of Statistical Inference, and Examples

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Inference

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Statistical Methods in Particle Physics

f (1 0.5)/n Z =

CH.9 Tests of Hypotheses for a Single Sample

Lecture 4: Parameter Es/ma/on and Confidence Intervals. GENOME 560, Spring 2015 Doug Fowler, GS

Further Remarks about Hypothesis Tests

Hypothesis tests

1; (f) H 0 : = 55 db, H 1 : < 55.

SPRING 2007 EXAM C SOLUTIONS

BIOS 312: Precision of Statistical Inference

Topic 10: Hypothesis Testing

Statistical Inference

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

CS 543 Page 1 John E. Boon, Jr.

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

14.30 Introduction to Statistical Methods in Economics Spring 2009

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Chapter 12: Inference about One Population

Topic 17: Simple Hypotheses

Preliminary Statistics. Lecture 5: Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

Hypothesis Testing - Frequentist

Chapter 9 Inferences from Two Samples

Transcription:

Basic Concepts of Inference Corresponds to Chapter 6 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT) with some slides by Jacqueline Telford (Johns Hopkins University) and Roy Welsch (MIT). 1

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H. G. Wells Statistical Inference Deals with methods for making statements about a population based on a sample drawn from the population Point Estimation: Estimate an unknown population parameter Confidence Interval Estimation: Find an interval that contains the parameter with preassigned probability. Hypothesis testing: Testing hypothesis about an unknown population parameter 2

Examples Point Estimation: estimate the mean package weight of a cereal box filled during a production shift Confidence Interval Estimation: Find an interval [L,U] based on the data that includes the mean weight of the cereal box with a specified probability Hypothesis testing: Do the cereal boxes meet the minimum mean weight specification of 16 oz? 3

Two Levels of Statistical Inference Informal, using summary statistics (may only be descriptive statistics) Formal, which uses methods of probability and sampling distributions to develop measures of statistical accuracy 4

Estimation Problems Point estimation: estimation of an unknown population parameter by a single statistic calculated from the sample data. Confidence interval estimation: calculation of an interval from sample data that includes the unknown population parameter with a pre-assigned probability. 5

Point Estimation Terminology Estimator = the random variable (r.v.) θˆ, a function of the X i s (the general formula of the rule to be computed from the data) Estimate = the numerical value of θˆ calculated from the observed sample data X 1 = x 1,..., X n = x n (the specific value calculated from the data) Example: X i ~ N(µ,σ 2 ) Estimate = x X n i= = 1 n i= = 1 n x i n X i Estimator = is an estimator of µ = µˆ (= 10.2) is an estimate of µ Other estimators of µ? 6

Methods of Evaluating Estimators Bias and Variance Bias( ˆ) θ = E( ˆ θ ) θ - The bias measures the accuracy of an estimator. - An estimator whose bias is zero is called unbiased. - An unbiased estimator may, nevertheless, fluctuate greatly from sample to sample. Var {[ ˆ E( ˆ)] 2 } ( ˆ) θ = E θ θ -The lower the variance, the more precise the estimator. - A low-variance estimator may be biased. - Among unbiased estimators, the one with the lowest variance should be chosen. Best =minimum variance. 7

Accuracy and Precision accurate and precise accurate, not precise precise, not accurate not accurate, not precise Diagram courtesy of MIT OpenCourseWare 8

Mean Squared Error - To chose among all estimators (biased and unbiased), minimize a measure that combines both bias and variance. - A good estimator should have low bias (accurate) AND low variance (precise). {[ ˆ θ )] 2 } ( ˆ) θ = E θ 2 MSE = Var( θ + [ Bias( θ (eqn 6.2) MSE = expected squared error loss function ˆ) ˆ)] Bias( ˆ) θ = E( ˆ θ ) θ {[ ˆ θ E( ˆ)] 2 } Var( ˆ) θ = E θ 9

Example: estimators of variance Two estimators of variance: n 2 2 S1 = = ( X X ) ( n 1) i i is unbiased (Example 6.3) 1 S 2 n 2 i 1 i ) 2 = = ( X X n is biased but has smaller MSE (Example 6.4) In spite of larger MSE, we almost always use S 1 2 10

Example - Poisson (See example in Casella & Berger, page 308) 11

Standard Error (SE) - The standard deviation of an estimator is called the standard error of the estimator (SE). - The estimated standard error is also called standard error (se). - The precision of an estimator is measured by the SE. Examples for the normal and binomial distributions: 1. X is an unbiased estimator of SE( X ) = σ se ( X ) = s n 2. pˆ is an unbiased estimator of p se ( pˆ) = pˆ(1 pˆ ) n n µ are called the standard error of the mean 12

Precision and Standard Error A precise estimate has a small standard error, but exactly how are the precision and standard error related? If the sampling distribution of an estimator is normal with mean equal to the true parameter value (i.e., unbiased). Then we know that about 95% of the time the estimator will be within two SE s from the true parameter value. 13

Methods of Point Estimation Method of Moments (Chapter 6) Maximum Likelihood Estimation (Chapter 15) Least Squares (Chapter 10 and 11) 14

Method of Moments Equate sample moments to population moments (as we did with Poisson). Example: for the continuous uniform distribution, f(x a,b)=1/(b-a), a x b E(X) = (b+a)/2, Var(X)=(b-a) 2 /12 Set X = (b+a)/2 S 2 = (b-a) 2 /12 Solve for a and b (can be a bit messy). 15

Maximum Likelihood Parameter Estimation By far the most popular estimation method! (Casella & Berger). MLE is the parameter point for which observed data is most likely under the assumed probability model. Likelihood function: L(θ x) = f(x θ), where x is the vector of sample values, θ also a vector possibly. When we consider f(x θ), we consider θ as fixed and x as the variable. When we consider L(θ x), we are considering x to be the fixed observed sample point and θ to be varying over all possible parameter values. 16

MLE (continued) If X 1.X n are iid then L(θ x)=f(x 1 x n θ) = f(x i θ) The MLE of θ is the value which maximizes the likelihood function (assuming it has a global maximum). Found by differentiating when possible. Usually work with log of likelihood function ( ). Equations obtained by setting partial derivatives of ln L(θ) = 0 are called the likelihood equations. See text page 616 for example normal distribution. 17

Confidence Interval Estimation We want an interval [ L, U ] where L and U are two statistics calculated from X 1, X 2,, X n such that P[ L θ U] = 1 - α Note: L and U are random and θ is fixed but unknown regardless of the true value of θ. [ L, U ] is called a 100(1-α)% confidence interval (CI). 1-αis called the confidence level of the interval. After the data is observed X 1 = x 1,..., X n = x n, the confidence limits L = l and U = u can be calculated. 18

95% Confidence Interval: Normal σ 2 known Consider a random sample X 1, X 2,, X n ~ N(µ,σ 2 ) where σ 2 is assumed to be known and µ is an unknown parameter to be estimated. Then X µ P 1.96 1.96 = σ n 0.95 By the CLT even if the sample is not normal, this result is approximately correct. σ σ P L = X 1.96 µ U = X + 1.96 = n n 0.95 σ σ l = x 1.96 µ x + 1.96 = u is a 95% CI for µ n n (two-sided) See Example 6.7, Airline Revenues, p. 204 19

Normal Distribution, 95% of area under curve is between -1.96 and 1.96 dnorm(x) 0.0 0.1 0.2 0.3 0.4 9 5 % -3-2 -1 0 1 2 3 x This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation. 20

Frequentist Interpretation of CI s In an infinitely long series of trials in which repeated samples of size n are drawn from the same population and 95% CI s for µ are calculated using the same method, the proportion of intervals that actually include µ will be 95% (coverage probability). However, for any particular CI, it is not known whether or not the CI includes µ, but the probability that it includes µ is either 0 or 1, that is, either it does or it doesn t. It is incorrect to say that the probability is 0.95 that the true µ is in a particular CI. See Figure 6.2, p. 205 21

95% CI, 50 samples from unit normal distribution 95% Confidence Interval -1.0-0.5 0.0 0.5 1.0 0 10 20 30 40 50 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation. 22

Arbitrary Confidence Level for CI: σ 2 known 100(1-α)% two-sided CI for µ based on the observed sample mean x Z σ / 2 µ x + n Z α α / 2 σ n For 99% confidence, Z α/2 = 2.576 The price paid for higher confidence level is a wider interval. For large samples, these CI can be used for data from any distribution, since by CLT N(µ, σ 2 /n). x 23

One-sided Confidence Intervals µ x Zα σ Lower one-sided CI For 95% µ x + Zα n σ n Upper one-sided CI confidence, Z α = 1.645 vs. Z α/2 = 1.96 One-sided CIs are tighter for the same confidence level. 24

Hypothesis Testing The objective of hypothesis testing is to access the validity of a claim against a counterclaim using sample data. The claim to be proved is the alternative hypothesis (H 1 ). The competing claim is called the null hypothesis (H 0 ). One begins by assuming that H 0 is true. If the data fails to contradict H 0 beyond a reasonable doubt, then H 0 is not rejected. However, failing to reject H 0 does not mean that we accept it as true. It simply means that H 0 cannot be ruled out as a possible explanation for the observed data. A proof by insufficient data is not a proof at all. 25

Testing Hypotheses The process by which we use data to answer questions about parameters is very similar to how juries evaluate evidence about a defendant. from Geoffrey Vining, Statistical Methods for Engineers, Duxbury, 1st edition, 1998. For more information, see that textbook. 26

Hypothesis Tests A hypothesis test is a data-based rule to decide between H 0 and H 1. A test statistic calculated from the data is used to make this decision. The values of the test statistics for which the test rejects H 0 comprise the rejection region of the test. The complement of the rejection region is called the acceptance region. The boundaries of the rejection region are defined by one or more critical constants (critical values). See Examples 6.13(acc. sampling) and 6.14(SAT coaching), pp. 210-211. 27

Hypothesis Testing as a Two-Decision Problem Framework developed by Neyman and Pearson in 1933. When a hypothesis test is viewed as a decision procedure, two types of errors are possible: Decision Reality H 0 True H 0 False Do not reject H 0 Reject H 0 Correct Decision Confidence 1 - α α Type II Error Failure to Detect β Type I Error Significance Level Correct Decision Prob. of Detection 1 - β Column Total 1 1 =1 =1 28

Probabilities of Type I and II Errors α = P{Type I error} = P{Reject H 0 when H 0 is true} = P{Reject H 0 H 0 } also called α-risk or producer s risk or false alarm rate β = P{Type II error} = P{Fail to reject H 0 when H 1 is true} = P{Fail to reject H 0 H 1 } also called β-risk or consumer s risk or prob. of not detecting π = 1 - β = P{Reject H 0 H 1 } is prob. of detection or power of the test We would like to have low α and low β (or equivalently, high power). α and 1-β are directly related, can increase power by increasing α. These probabilities are calculated using the sampling distributions from either the null hypothesis (for α) or alternative hypothesis (for β). 29

Example 6.17 (SAT Coaching) See Example 6.17, SAT Coaching, in the course textbook. 30

Power Function and OC Curve The operating characteristic function of a test is the probability that the test fails to reject H 0 as a function of θ, where θ is the test parameter. OC(θ) = P{test fails to reject H 0 θ} For θ values included in H 1 the OC function is the β risk. The power function is: π(θ) = P{Test rejects H0 θ} = 1 OC(θ) Example: In SAT coaching, for the test that rejects the null hypothesis when mean change is 25 or greater, the power = 1-pnorm(25,mean=0:50,sd=40/sqrt(20)) 31

Level of Significance The practice of test of hypothesis is to put an upper bound on the P(Type I error) and, subject to that constraint, find a test with the lowest possible P(Type II error). The upper bound on P(Type I error) is called the level of significance of the test and is denoted by α (usually some small number such as 0.01, 0.05, or 0.10). The test is required to satisfy: P{ Type I error } = P{ Test Rejects H 0 H 0 } α Note that α is now used to denote an upper bound on P(Type I error). Motivated by the fact that the Type I error is usually the more serious. A hypothesis test with a significance level α is called an a α-level test. 32

Choice of Significance Level What α level should one use? Recall that as P(Type I error) decreases P(Type II error) increases. A proper choice of α should take into account the relative costs of Type I and Type II errors. (These costs may be difficult to determine in practice, but must be considered!) Fisher said: α =0.05 Today α = 0.10, 0.05, 0.01 depending on how much proof against the null hypothesis we want to have before rejecting it. P-values have become popular with the advent of computer programs. 33

Observed Level of Significance or P-value Simply rejecting or not rejecting H 0 at a specified α level does not fully convey the information in the data. Example: H 0 : µ = 15 vs H 1 : µ > 15 is rejected at the α = 0.05 40 when x > 15 + 1.645 = 29. 71 Is a sample with a mean of 30 equivalent to a sample with a mean of 50? (Note that both lead to rejection at the α-level of 0.05.) More useful to report the smallest α-level for which the data would reject (this is called the observed level of significance or P-value). Reject H 0 if P-value < α 20 34

Example 6.23 (SAT Coaching: P-Value) See Example 6.23, SAT Coaching, on page 220 of the course textbook. 35

One-sided and Two-sided Tests H 0 : µ = 15 can have three possible alternative hypotheses: H 1 : µ > 15, H 1 : µ < 15, or H 1 : µ 15 (upper one-sided) (lower one-sided) (two-sided) Example 6.27 (SAT Coaching: Two-sided testing) See Example 6.27 in the course textbook. 36

Example 6.27 continued See Example 6.27, SAT Coaching, on page 223 of the course textbook. 37

Relationship Between Confidence Intervals and Hypothesis Tests An α-level two-sided test rejects a hypothesis H 0 : µ = µ 0 if and only if the (1- α)100% confidence interval does not contain µ 0. Example 6.7 (Airline Revenues) See Example 6.7, Airline Revenues, on page 207 of the course textbook. 38

Use/Misuse of Hypothesis Tests in Practice Difficulties of Interpreting Tests on Non-random samples and observational data Statistical significance versus Practical significance Statistical significance is a function of sample size Perils of searching for significance Ignoring lack of significance Confusing confidence (1 - α) with probability of detecting a difference (1 - β) 39

Jerzy Neyman Egon Pearson (1894-1981) (1895-1980) Carried on a decades-long feud with Fisher over the foundations of statistics (hypothesis testing and confidence limits) - Fisher never recognized Type II error & developed fiducial limits 40