MTMS Mathematical Statistics

Similar documents
Mathematical statistics

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Mathematical statistics

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Mathematical Statistics

Hypothesis testing: theory and methods

Introductory Econometrics

Review. December 4 th, Review

8.1-4 Test of Hypotheses Based on a Single Sample

Chapter 5: HYPOTHESIS TESTING

Tests about a population mean

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

The Purpose of Hypothesis Testing

Statistical Process Control (contd... )

Introductory Econometrics. Review of statistics (Part II: Inference)

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Introduction to Statistics

Basic Concepts of Inference

Preliminary Statistics. Lecture 5: Hypothesis Testing

Design of Engineering Experiments

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Section 9.1 (Part 2) (pp ) Type I and Type II Errors

Lecture 5: ANOVA and Correlation

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

CSE 312 Final Review: Section AA

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

INTERVAL ESTIMATION AND HYPOTHESES TESTING

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Hypothesis Testing The basic ingredients of a hypothesis test are

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Chapter 7: Hypothesis Testing

Chapter 24. Comparing Means

F79SM STATISTICAL METHODS

Statistical Inference. Hypothesis Testing

10.4 Hypothesis Testing: Two Independent Samples Proportion

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Two Sample Hypothesis Tests

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

41.2. Tests Concerning a Single Sample. Introduction. Prerequisites. Learning Outcomes

Topic 10: Hypothesis Testing

This does not cover everything on the final. Look at the posted practice problems for other topics.

Hypothesis testing for µ:

Central Limit Theorem ( 5.3)

Statistical Inference

Lectures 5 & 6: Hypothesis Testing

For use only in [the name of your school] 2014 S4 Note. S4 Notes (Edexcel)

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Econ 325: Introduction to Empirical Economics

Chapter Six: Two Independent Samples Methods 1/51

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Probability and Probability Distributions. Dr. Mohammed Alahmed

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Statistics 251: Statistical Methods

The problem of base rates

Statistics II Lesson 1. Inference on one population. Year 2009/10

ST5215: Advanced Statistical Theory

Introduction to Statistical Inference

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Introduction to Statistics

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Math 494: Mathematical Statistics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Lecture 2 Sep 5, 2017

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Lecture 13: p-values and union intersection tests

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

The Components of a Statistical Hypothesis Testing Problem

STAT 285: Fall Semester Final Examination Solutions

Ch. 5 Hypothesis Testing

1 Hypothesis testing for a single mean

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

CONTINUOUS RANDOM VARIABLES

Solution: First note that the power function of the test is given as follows,

Sampling Distributions

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Institute of Actuaries of India

Sociology 6Z03 Review II

CH.9 Tests of Hypotheses for a Single Sample

Week 12 Hypothesis Testing, Part II Comparing Two Populations

MAS113 Introduction to Probability and Statistics

Hypothesis tests

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Chapter Three. Hypothesis Testing

Lecture 12 November 3

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

example: An observation X comes from a normal distribution with

Transcription:

MTMS.01.099 Mathematical Statistics Lecture 12. Hypothesis testing. Power function. Approximation of Normal distribution and application to Binomial distribution Tõnu Kollo Fall 2016

Hypothesis Testing A statistical hypothesis, or simply a hypothesis, is an assumption about a population parameter or population distribution. Hypothesis testing is the procedure whereby we decide to reject or fail to reject a hypothesis. Null hypothesis H 0 : This is the hypothesis (assumption) under investigation or the statement being tested. The null hypothesis is a statement that there is no effect, there is no difference, or there is no change. The possible outcomes in testing a null hypothesis are reject or fail to reject. Alternate hypothesis H 1 : This is a statement you will adopt if there is strong evidence (sample data) against the null hypothesis. A statistical test is designed to assess the strength of the evidence (data) against the null hypothesis. Fall 2016 2 / 24

Hypothesis Testing (2) Fail to Reject H 0 : We never say we accept or prove H 0 - we can only say we fail to reject it or stay with H 0. Failing to reject H 0 means there is NOT enough evidence in the data and in the test to justify rejecting H 0. So, we retain the H 0 knowing we have not proven it true beyond all doubt. Rejecting H 0 : This means there IS significant evidence in the data and in the test to justify rejecting H 0. When H 0 is rejected we adopt H 1. When we adopt H 1 we know that we can be occasionally wrong, the same time we say that H 1 is proved. Fall 2016 3 / 24

Hypothesis Testing (3) Type I error occurs when we reject a true null hypothesis H 0. We denote by α the probability of making Type I error. Type II error occurs when we fail to reject a false null hypothesis H 0. We denote by β the probability of making Type II error. For a given sample size reducing the probability of a type I error increases the probability of a type II error, and vice versa. The level of significance α is the probability we are willing to risk rejecting H 0 when it is true. Typically α = 0.05 is used. Fall 2016 4 / 24

Types of error Fall 2016 5 / 24

Test statistic, Critical region, Test of significance Definition A test statistic is a numerical function of the data whose value determines the result of the test. The function itself is generally denoted T = T (X), where X is theoretical sample, i.e. represents the data. After being evaluated for the sample data x, the result is called an observed test statistic and is written in lowercase, t = T (x). Definition The set of all test statistic values for which H 0 will be rejected, is said to be critical region, denoted by C. Definition Test of significance is the control rule, which leads to rejection of H 0 if the test statistic value falls in the critical region C: T (X) C + reject H 0 Fall 2016 6 / 24

Example 1 Example 1. The drying time of a paint under specified test conditions is known to be rv X N(75, 9 2 ). Chemists have proposed a new additive designed to decrease average drying time. It is believed that drying times with this additive will remain normally distributed with standard deviation 9. Let µ denote the true average drying time when the additive is used. The appropriate hypotheses are : H 0 : µ 75, H 1 : µ < 75. Experimental data is to consist of drying times from n = 25 test specimens. A reasonable rejection region has the form x < c, where the cut-off value c is suitably chosen. Consider the choice c = 70.8, so that the test procedure consists of test statistic X and rejection region x < 70.8. Based on data n = 25 the following decision is made: (test 1) if x < 70.8, then reject H 0. Find both Type I and II error of this kind of test. Fall 2016 7 / 24

Example 1 Example 1 continues. Calculation of α and β now involves a routine standardization of X followed by reference to the standard normal probabilities: α = P(Type I error) = P(H 0 is rejected when it is true) = P(X < 70.8 when X N(75, 1.8 2 )) ( ) 70.8 75 = Φ = Φ( 2.33) = 0.01 1.8 Fall 2016 8 / 24

Example 1 Example 1 continues. Calculation of α and β now involves a routine standardization of X followed by reference to the standard normal probabilities: α = P(Type I error) = P(H 0 is rejected when it is true) = P(X < 70.8 when X N(75, 1.8 2 )) ( ) 70.8 75 = Φ = Φ( 2.33) = 0.01 1.8 β(72) = P(Type II error when µ = 72) = P(H 0 is not rejected when it is false because µ = 72) = P(X 70.8 when X N(72, 1.8 2 )) ( ) 70.8 72 = 1 Φ = 1 Φ( 0.67) = 1 0.2514 = 0.74 1.8 Fall 2016 8 / 24

Example 1 Example 1 continues. Calculation of α and β now involves a routine standardization of X followed by reference to the standard normal probabilities: α = P(Type I error) = P(H 0 is rejected when it is true) = P(X < 70.8 when X N(75, 1.8 2 )) ( ) 70.8 75 = Φ = Φ( 2.33) = 0.01 1.8 β(72) = P(Type II error when µ = 72) = P(H 0 is not rejected when it is false because µ = 72) = P(X 70.8 when X N(72, 1.8 2 )) ( ) 70.8 72 = 1 Φ = 1 Φ( 0.67) = 1 0.2514 = 0.74 1.8 ( ) 70.8 70 β(70) = 1 Φ = 0.33, β(67) = 0.0174 1.8 Fall 2016 8 / 24

Example 1 Example 1 continues. The use of cut-off value c 70.8 resulted in a very small value of α (0.01) but rather large β s. Consider the same experiment and test statistic X with the new rejection region x < 72; (test 2) if x < 72 then reject H 0. What are the Type I and II error probabilities with this kind of test? Which test to prefer? You can construct many tests for testing some particular hypotheses. A test is good, if its Type I error is small and if H 0 is rejected with large probability in case H 1 is true. It is also necessary to control the Type II error. For comparing tests the power function is used. Fall 2016 9 / 24

The Power function For the moment we suppose that the null hypothesis consists of one value, which we call θ 0. Definition The power function h(θ) of a test is the probability of rejecting H 0 when the value θ is the correct value of the parameter in the parameter space: h(θ) = P(H 0 is rejected if θ is the correct value of the parameter) or just h(θ) = P(T (X) C θ). So, if β(θ) = P(fail to reject H 0 θ), then h(θ) = 1 β(θ). A test is good if: h(θ) is large for all θ H 1 and h(θ) is small for θ H 0. Fall 2016 10 / 24

The Power function The properties of the power function: 0 h(θ) 1, The level of significance, e.g. the probability of making a mistake when rejecting H 0 is expressed from the power function: sup θ H0 h(θ) = α, if H 0 is a simple hypothesis, then h(θ 0 ) = α. Fall 2016 11 / 24

Example 1. Comparison of Power functions Fall 2016 12 / 24

Relationship of α and β Proposition Suppose an experiment and a sample size are fixed and a test statistic is chosen. Then decreasing the size of the rejection region to obtain a smaller value of α results in a larger value of β for any particular parameter value consistent with H 1. This proposition says that once the test statistic and n are fixed, there is no rejection region that will simultaneously make both α and all β s small. A region must be chosen to effect a compromise between α and β. Fall 2016 13 / 24

The Power function If you still feel confused, check this online course page, where you can find some examples of power function (with some nice illustrations ) Find the page here: Click on me! Fall 2016 14 / 24

Hypotheses about the mean of Normal distribution In Example 1 we stated the hypotheses about the mean of Normal distribution. The General Algorithm for testing hypotheses about mean: We have a random sample x 1,..., x n. Consider point estimate ˆµ = x. Corresponding test statistic is X = 1 n n X i, X i N(µ, σ 2 ), independent i=1 In case H 0 is true, X i N(µ 0, σ 2 ), X N(µ 0, σ2 n ). Fall 2016 15 / 24

Hypotheses about the mean of Normal distribution Form test statistics U and V, which are in case H 0 is true distributed as follows: U = X µ 0 σ/ n N(0, 1), if σ is known. V = X µ 0 s/ n t(f ), f = n 1, if σ is unknown. Control rule (test) depends on: hypotheses being tested and available information about σ. Fall 2016 16 / 24

Hypotheses about the difference of two Population mean (independent samples) Let us have two independent samples x 1,..., x n1 and y 1,..., y n2, with corresponding distributions N(µ 1, σ1 2) and N(µ 2, σ2 2 ) and the following hypotheses: H 0 : µ 1 = µ 2 µ 1 µ 2 = 0, H 1 : µ 1 µ 2 µ 1 µ 2 0. Algorithm for testing hypotheses: Form an estimate for the difference of parameters, ˆµ 1 ˆµ 2 = x y. Corresponding test statistic T (X Y ) = X Y E(X Y ) Var(X Y ) where E(X Y ) = µ 1 µ 2. F, Fall 2016 17 / 24

Hypotheses about the difference of two Population means (independent samples) Distribution F depends on available information about variances σ 2 1 and σ2 s. If variances are known, then Var(X Y ) = σ2 1 n 1 + σ2 2 n 2 T (X Y ) = X Y (µ 1 µ 2 ) σ 2 1 n 1 + σ2 2 n 2 N(0, 1), If variances are unknown, ( ) but equal, then Var(X Y ) = s 2 1 n 1 + 1 n 2 and where T (X Y ) = X Y (µ 1 µ 2 ) t(n 1 + n 2 2), 1 s n 1 + 1 n 2 s = n1 i=1 (x i x) 2 + n 2 i=1 (y i y) 2. n 1 + n 2 2 and Fall 2016 18 / 24

Hypotheses about the difference of two Population means (independent samples) If there is no information ( ) about σ 1 and σ 2, then s 2 Var(X Y ) = 1 n 1 + s2 2 n 2 and where T (X Y ) = X Y (µ 1 µ 2 ) s 2 1 n 1 + s2 2 n 2 t(f ) N(0, 1), s 2 1 = 1 n 1 1 n 1 i=1 (x i x) 2 and s 2 2 = 1 n 2 1 f = ( s 2 1 n 1 + s2 1 n 2 ) 2 (s1 2/n 1) 2 n 1 1 + (s2 2 /n 2) 2 n 2 1 n 2 (y i y) 2, i=1 Fall 2016 19 / 24

Hypotheses about the difference of two Population means (independent samples) And final step of the Algorithm: Test: if t(x, y) > q α/2 then reject H 0. Here q α/2 is the complement quantile of N(0, 1), t(n 1 + n 2 2) or t(f ) distribution. Fall 2016 20 / 24

Using the Normal Approximation We have showed beforehand how the normal approximation can be used for interval estimation. Hypothesis testing can be performed in a similar way. Let us have a random sample x 1,..., x n from F(θ), where θ is unknown. F is not a Normal distribution. Consider hypotheses H 0 : θ = θ 0, H 1 : θ θ 0. Consider a consistent point estimate of θ, ˆθ. If H 0 is true, then in case n, U = ˆθ θ 0 Var(ˆθ) N(0, 1), V = ˆθ θ 0 N(0, 1). Var(ˆθ) Fall 2016 21 / 24

Using the Normal Approximation In case of large sample n we can still use the Normal distribution quantiles for testing. For given hypotheses, we test if u > λ α/2 or v > λ α/2. If the inequality above holds, then reject H 0. Remember, the level of significance of the test is now approximately α (because test statistic is only approximately normally distributed). In practice, usually t-quantile is used for testing, because t α (n 1) > λ α and in case n, t α (n 1) λ α Using t-quantile decreases the probability of Type I error, because it is more difficult to reject H 0. But in case of failure of rejecting H 0, the probability of making a mistake is smaller when using quantiles of Normal distribution. Fall 2016 22 / 24

Application to the Binomial Distributions Let p denote the proportion of individuals or objects in a population who possess a specified property (e.g. cars with manual transmissions or smokers who smoke a filter cigarette). Let us have a random sample x 1,..., x n. We are interested in testing the following hypotheses: H 0 : p = p 0 H 1 : p p 0 Consider the point estimate p of parameter p. The distribution of corresponding estimator ˆp is approximately Normal when n increases ((np 0 10), n(1 p 0 ) 10). So in case of large sample U = ˆp E ˆp Var ˆp H 0 true = ˆp p 0 p0 (1 p 0 )/n N(0, 1) Test: u > λ α/2. Type I error is approximately α. Fall 2016 23 / 24

Application to the Binomial Distributions. Example Example 1. A food industry plans to replace the standard production method by a new one. It is then important to know if the flavour of the product is changed. A so-called triangle test is performed. Each of 500 persons tastes, in random order, three packets of the product, two of which are manufactured according to the standard method and one according to the new method. Each person is asked to select the packet which tastes differently from the two others. 200 persons select the packet with new production. If there is no difference between the tastes, the probability that the new method is selected is 1/3; if there is a difference, we will have p > 1/3. Hence our hypotheses are H 0 : p = 1/3 H 1 : p > 1/3. The firm is satisfied with α = 0.05, λ 0.05 = 1.64. Compute 200 ˆp p 0 u = p0 (1 p 0 )/n = 500 1 3 1 3 2 3 = 3.2 1 500 Since 3.2 > 1.64, we can reject H 0 and say that there is difference between the tastes. Fall 2016 24 / 24