Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Similar documents
Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Performance Evaluation and Comparison

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

INTERVAL ESTIMATION AND HYPOTHESES TESTING

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

How do we compare the relative performance among competing models?

Two Sample Hypothesis Tests

Statistical Inference

Introductory Econometrics

Solution: First note that the power function of the test is given as follows,

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Introductory Econometrics. Review of statistics (Part II: Inference)

Chapter 8 of Devore , H 1 :

Chapter 5: HYPOTHESIS TESTING

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Math 494: Mathematical Statistics

1; (f) H 0 : = 55 db, H 1 : < 55.

MTMS Mathematical Statistics

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

1 Statistical inference for a population mean

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Solutions to Practice Test 2 Math 4753 Summer 2005

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

A3. Statistical Inference

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Introduction to Statistical Inference

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

CBA4 is live in practice mode this week exam mode from Saturday!

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Evaluation. Andrea Passerini Machine Learning. Evaluation

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Evaluation requires to define performance measures to be optimized

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

6.4 Type I and Type II Errors

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

Statistical Inference

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Probability and Statistics Notes

POLI 443 Applied Political Research

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Hypothesis Testing One Sample Tests

1 Hypothesis testing for a single mean

CSE 312 Final Review: Section AA

Confidence Intervals and Hypothesis Tests

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Data Mining. CS57300 Purdue University. March 22, 2018

Chapter 9 Inferences from Two Samples

EXAM 3 Math 1342 Elementary Statistics 6-7

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Visual interpretation with normal approximation

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Mathematical Statistics

The Components of a Statistical Hypothesis Testing Problem

HYPOTHESIS TESTING. Hypothesis Testing

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Difference between means - t-test /25

Slides for Data Mining by I. H. Witten and E. Frank

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Institute of Actuaries of India

Single Sample Means. SOCY601 Alan Neustadtl

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Chapter 5 Confidence Intervals

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Chapters 10. Hypothesis Testing

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Math 152. Rumbos Fall Solutions to Exam #2

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Econ 325: Introduction to Empirical Economics

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Central Limit Theorem ( 5.3)

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Topic 10: Hypothesis Testing

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Stat 231 Exam 2 Fall 2013

Confidence Intervals, Testing and ANOVA Summary

The Purpose of Hypothesis Testing

Confidence Intervals for Normal Data Spring 2014

Two-Sample Inferential Statistics

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

Hypothesis tests

hypothesis a claim about the value of some parameter (like p)

Business Statistics. Lecture 10: Course Review

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Transcription:

Summary: the confidence interval for the mean (σ known) with gaussian assumption on X Let X be a Gaussian r.v. with mean µ and variance σ. If X 1, X,..., X n is a random sample drawn from X then the confidence interval for µ of level 1 can be written as follows µ ( X n ± z 1 ) σ n The same result holds if X is any other random variable provided that the sample size is not so small. 1

Confidence intervals Confidence interval for the mean (σ unknown) under gaussian assumption If the value of the variance σ is not known we estimate it using S n. If X 1, X,..., X n is random sample i.i.d as X then the confidence interval for µ with confidence level 1 can be rewritten in the form µ X n ± t (n 1) 1 S n n

Student-t random variable The value t (n 1) in the previous formula of the confidence interval for the mean 1 / iwth unknown variance is the analogue of z 1 but calculated for the Student-t random variable. That value can be found in a opportune statistical table. The Student-t distribution is quite similar to the gaussian in shape but it has higher tails and it is more concentrated around the mean. If we denote by T n 1 the Student-t random variable with n 1 degrees of freedom we have that ( P T n 1 < t (n 1) ) = 1 1 / Example: Calculate P ( T 8 < t ) = 0.90; P ( T 6 > t ) = 0.99; P ( T 11 >.5 ) ; P ( T 3 < t ) = 0.05 3

Student-t random variable g p = P(t t pg ) = tp f(x)dx p 0 t p g The shaded area correspond to p = P freedom and p is the probability ( T g < t (g) p ) where g are the degrees of 4

Example: weight, in grams, of some grains of dust on silicon circuits is supposed to be normally distributed with parameters µ and σ. Data are riportati di seguito: 0.39 0.68 0.8 1.35 1.38 1.6 1.70 1.71 1.85.14.89 3.69 After getting an estimate for µ build confidence interval with confidence level 95% eand 99% assuming σ unknown Let us calculate x n x n = 1 n n i=1 x i = 0.39 + 0.68 + + 3.69 1 = 1.685 5

As σ ìn unknown we use the statistics s n s n = 1 n 1 n i=1 (x i x n ) = 0.85 The confidence interval can be obtained from the formula s µ x n n ± n t(n 1) 1 6

In the two cases the values of Student-t are and Substituting the values o t (11) 1 0.05 t (11) 1 0.01 = t (11) 0.975 =.01 = t (11) 0.995 = 3.106 µ (1.10,.7) level 95%[(1.16,.1) σ known] µ (0.86,.51) level 99%[(1.00,.37) σ known] Warning 1: for fixed n the length of the interval grows with the confidence level Warning : for given n and 1, the length of the interval is bigger if the variance is to be estimated Warning 3: the length of a confidence interval depends on the following two quantities: the sample size n and the confidence level 1. 7

The right choice of the sample size n The length of the confidence interval when σ is known, is L(n, ) = ( X n + z 1 = z 1 σ n σ n ( X n z 1 The length of the interval L(n, ) when σ is not known is L(n, ) = X n + t n 1 s 1 n n = t n 1 1 s n n X n t n 1 1 )) σ n which also depends on s n. The length of first interval (σ known) is not affected by the value of σ from one sample to another s n n 8

We want to find n such that L(n, ) < C and then L(n, ) = z 1 n > σ n < C ( ) σ z 1 C z 1 σ C < n 9

Suppose now that we are interested in a confidence interval for the proportion p in a Bernoulli scheme. If X i are i.i.d. Bernoulli random variable with unknown parameter p we have seen that n i=1 X i Bin(n, p). We know that we can use the gaussian approximation for the binomial random variable in the framework of large samples. The natural estimator for p is ˆp n which is essentially a Binomial random variable rescaled by the factor 1/n. For large n we have then Z = ˆp n p p(1 p) n N(0, 1) Making the same steps as in previous cases we get the following interval p(1 p) p ˆp n ± z 1 n but we cannot calculate it as p is unknown. 10

We can then bypass this problem by plugging in ˆp n in place of p in the previous interval and we obtain an asymptotic and approximated confidence interval for p Confidence interval for the proportion Let X be a Bernoulli random variable with mean p. If X 1, X,..., X n is i.i.d. random sample drawn from X then the asymptotic (and approximated) confidence interval for p of level 1 can be written in this form p ˆp n ± z 1 ˆp n (1 ˆp n ) n 11

Hypotheses testing If we have some idea of a possible value of the unknown parameter we can test this hypothesis to a test which, after empirical evidence, allows us to accept or reject our thesis. Hypotheses testing for the mean Assume a gaussian model X with unknown mean µ and known variance σ. Vi would like to verify the hypothesis that the true value of the mean is µ 0. We denote this hypothesis by which is also called the null hypotheses H 0 : µ = µ 0 A test is a decision rule that leads to two alternatives only: either we reject the null hypothesis H 0 or we accept it (better: there is no evidence for rejecting it) The decision is based on the observation of an i.i.d. sample 1

Given that the decision is based on a random sample, there is always the possibility of taking the wrong decision. Such errors are reported below We have then Reject H 0 Non Rejection of H 0 H 0 is true I st type error no errors 1 H 0 is false no errors II nd type error 1 β β = P (rejection H 0 H 0 is true) β = P (non rejection H 0 H 0 is false) In general, we will observe value of x n different from µ 0. A decision rule of a test will then consider the distance of x n to µ 0 to figure if it is low or high. Switching to random variables: the test will consider the distance between X n and µ 0 to verify that this is not too high (in probability). 13

To decide when to reject H 0 we always need to specify an alternative hypothesis H 1 which can be of any kind but different from H 0. We start considering a bilateral alternative hypothesis. This hypothesis is denoted by H 1 : µ µ 0 We can deduce a decision rule of the following kind: if X n µ 0 is greater than some fixed value k we reject the null hypothesis H 0 : µ = µ 0 in favor of the alternative hypothesis H 1 : µ µ 0 14

How to choose k? Let us fix in such a way to guarantee that, with that particular choice of k we will make, at most, a first type error at most equal to The value of k must satisfy The value k = k is said test threshold P ( X n µ 0 > k H 0 true) = How to determine it? 15

If H 0 is true, then X n N(µ 0, σ /n) e Form which follows X n µ 0 σ n N(0, 1) = P ( X n µ 0 > k H 0 ) = P = P Z > k n σ = P By the symmetry of Z the value k is such that X n µ 0 σ n Z < k σ n e Z > k σ = z 1 cioè k = σ z n n 1 > k σ n H 0 k n σ 16

Summary: Let us denote by Z the test statistic Z = X n µ 0 σ n Assume that for a particular sample we obtain z as realization of Z z = x n µ 0 σ n The test tells us to reject the null hypothesis H 0 : µ = µ 0 in favor of H 1 : µ µ 0 if z lies outside the (rejection region) of the interval ( z 1, z 1 ) called acceptation region of the test. 17

Regione di rifiuto Regione di accettazione Regione di rifiuto z 0 z 1 18

There could be other type of alternative hypotheses. If we assume a null hypothesis against the alternative H 0 : µ = µ 0 H 1 : µ > µ 0 = P ( X n µ 0 > k H 0 ) = P The test will reject the null hypothesis if Z > k n σ z = x n µ 0 σ n > z 1 19

Regione di rifiuto Regione di accettazione 0 z 1 0

Of course, if the hypothesis is H 1 : µ < µ 0 the test will reject for values of z too small and in particular if z < z 1

Regione di rifiuto Regione di accettazione z 0

Let us summarize in a global scheme Test on the mean (σ known) If X is a gaussian random variable with unknown mean µ and variance σ known. If X 1, X,..., X n is an i.i.d. sample drawn from X then, at level, the test that verifies hypothesis of the type H 0 : µ = µ 0, has the following rejection region on the bases of the following alternatives: if H 1 : µ µ 0, Reject H 0 se z > z 1 when H 1 : µ > µ 0, Reject H 0 se z > z 1 when H 1 : µ < µ 0, Reject H 0 se z < z where z = x n µ 0 σ n 3

If the variance is unknown, we estimate it using s n. Test for the mean (σ nota) If X is a gaussian random variable with unknown parameters µ and σ. If X 1, X,..., X n is random sample i.i.d. drawn from X the, at level, the test that verifies H 0 : µ = µ 0, has the following rejection test when H 1 : µ µ 0, when H 1 : µ > µ 0, when H 1 : µ < µ 0, where Reject H 0 se t > t n 1 1 Reject H 0 se t > t n 1 1 Reject H 0 se t < t n 1 t = x n µ 0 e s n s n n = s n 4

Hypothesis testing for the proportion: we build a test to verify the distance between p 0 (our null hypothesis) and the observed sample proportion ˆp n. The decision rule has always the same form but it is based on the following test statistics Z Z = ˆp n p 0 p 0 (1 p 0 ) n Please notice that, contrary to what we did for the confidence interval case, we use the true variance under H 0, p 0 (1 p 0 )/n, to normalize the distance ˆp n p 0 and not the approximated variance ˆp n (1 ˆp n )/n 5

Test on the proportion Let X be a Bernoulli random variable with parameter p and X 1, X,..., X n a i.i.d. random sample drawn from X. The level test to verify the hypothesis H 0 : p = p 0, has the following form for different alternative hypotheses if H 1 : p p 0, Reject H 0 se z > z 1 if H 1 : p > p 0, Reject H 0 se z > z 1 where if H 1 : p < p 0, Reject H 0 se z < z z = ˆp n p 0 p 0 (1 p 0 ) n Be careful: this is an asymptotic test which can be used if n >> 30. 6