Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Similar documents
1 Statistical inference for a population mean

CBA4 is live in practice mode this week exam mode from Saturday!

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

Lecture 14 Simple Linear Regression

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Scatter plot of data from the study. Linear Regression

Solution: First note that the power function of the test is given as follows,

Linear models and their mathematical foundations: Simple linear regression

Scatter plot of data from the study. Linear Regression

Lecture 15: Inference Based on Two Samples

Visual interpretation with normal approximation

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Ch 2: Simple Linear Regression

4.1 Hypothesis Testing

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Paired comparisons. We assume that

Simple and Multiple Linear Regression

MATH 644: Regression Analysis Methods

WISE International Masters

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Simple Linear Regression

Chapter 7 Comparison of two independent samples

Lec 1: An Introduction to ANOVA

Topic 22 Analysis of Variance

The Components of a Statistical Hypothesis Testing Problem

Simple Linear Regression

Inference in Regression Analysis

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Statistical Methods in Natural Resources Management ESRM 304

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Evaluation. Andrea Passerini Machine Learning. Evaluation

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

STA 114: Statistics. Notes 21. Linear Regression

Math 3330: Solution to midterm Exam

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

1 Hypothesis testing for a single mean

Properties of Estimators

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Formal Statement of Simple Linear Regression Model

Evaluation requires to define performance measures to be optimized

16.3 One-Way ANOVA: The Procedure

CH.9 Tests of Hypotheses for a Single Sample

Analysis of Variance

Lecture 6 Multiple Linear Regression, cont.

Simple linear regression

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Lecture 15. Hypothesis testing in the linear model

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Two Sample Hypothesis Tests

Chapter 14 Simple Linear Regression (A)

Problem 1 (20) Log-normal. f(x) Cauchy

Two-Sample Inferential Statistics

TUTORIAL 8 SOLUTIONS #

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

Review of probability and statistics 1 / 31

2.830 Homework #6. April 2, 2009

Correlation Analysis

Multiple comparisons - subsequent inferences for two-way ANOVA

T test for two Independent Samples. Raja, BSc.N, DCHN, RN Nursing Instructor Acknowledgement: Ms. Saima Hirani June 07, 2016

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Stat 427/527: Advanced Data Analysis I

Lecture 5: ANOVA and Correlation

Table 1: Fish Biomass data set on 26 streams

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Chapter 10: Inferences based on two samples

Sociology 6Z03 Review II

11 Hypothesis Testing

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Comparison of Two Population Means

Ch 3: Multiple Linear Regression

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Inference for Regression

Inference for the Regression Coefficient

Y i = η + ɛ i, i = 1,...,n.

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Basic Statistics and Probability Chapter 9: Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses

ANOVA Analysis of Variance

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Simple Linear Regression

Introductory Econometrics

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Chapter 9 Inferences from Two Samples

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

Tests about a population mean

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Remedial Measures, Brown-Forsythe test, F test

Transcription:

1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately Normally distributed with mean µ and variance σ n The approximation gets better as n increases. Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and unknown standard deviation estimated by s). Then X µ s n T t n 1 General Approach to Hypothesis Testing 1. Define the research question and formulate the appropriate null and alternative hypotheses.. Decide on your type I error rate α). 3. Assume that the null is true, construct a test statistic, and identify what is known about the distribution of the test statistic under this assumption. 4. Collect a sample and evaluate the test statistic. 5. Calculate the p-value. Note that calculation of this value depends on the form of the alternative hypothesis. 6. If the p-value is less than α, reject the null. Otherwise, do not reject the null. Note: We have not yet considered type II error rates or size of the sample. More on this later. Suppose we collect 8 pairs of twins. The first twin in the pair is healthy; the second is not. For each twin, we measure grey matter density. Processed data from the 8 pairs is shown below units not given). Pair Twi Twi 1 134 18 139 136 3 176 163 4 15 153 5 166 157 6 161 154 7 19 15 8 1 14 3 Consider the population differences, d, where d is the sample mean of the differences, δ is the population mean of the differences, and s d is the sample standard deviation of the differences. The quantity t = d δ s d / n, is approximately t distributed with n 1 degrees of freedom. 4 Is grey matter density in the populations significantly different?

5 6 Pair Twi Twi difference 1 134 18 6 139 136 3 3 176 163 13 4 15 153-1 5 166 157 9 6 161 154 7 7 19 15 4 8 1 14 - In this sample, d = 4.875 and s d = 4.998. Test H 0 : δ = 0 versus H A : δ 0, at significance level α = 0.05. Compute the test statistic d 0 s d / n = 4.875 4.998/ 8 =.76 For 7 degrees of freedom t 0.975 =.365, so the null hypothesis is rejected and we conclude that there is a non-zero difference in grey matter density. Paired Data Paired data is where two observations are taken from the same individual or from two individuals who are very similar. For example, one observation from each eye of an individual, a before and after observation from an individual, the response to two different treatments on the same individual, a response for both a case and a control from similar individuals, and responses from twins. In many cases, the difference in response between the two treatments or states can be computed for each pair such as twi twi ). Suppose that we will sample n pairs X,Y ). From each pair we compute the difference D = X Y. Let δ denote the true difference in population means δ = µ 1 µ ), where µ 1 is population mean of X and µ is population mean of Y. 7 A two-sided 100 1 α) % confidence interval for the true mean difference δ = µ 1 µ ) between two paired samples of size n from a Normal distribution when the standard deviation of the differences σ d is not known) is given by 8 With paired data, we are usually interested in testing the hypotheses: s d d tn 1),1 α/, d s d + t n 1),1 α/ n n Two-sided One-sided or H 0 : δ = 0 versus H A : δ 0 H 0 : δ 0 versus H A : δ > 0 H 0 : δ 0 versus H A : δ < 0 For the last example, the 95% confidence interval is: 0.696, 9.05)

HW: What if the data were such that d = 4.15 and s d = 5.718. Would you reject the null of no difference in grey matter between the populations? Construct a 95 % confidence interval for this case. Does it contain zero? What is the relationship between hypothesis testing and confidence intervals? 9 Comparison of Two Population Means Samples can be paired or independent. Paired Samples: As we just saw, the hypothesis test proceeds just as in the one sample case. The data, d 1, d,..., d n, are defined to be the difference between the values in the first sample and the corresponding value in the second sample d i = x i y i, i = 1,,..., n). Just as before, construction of the test statistic depends on if the variance is known or not known. 10 Independent Samples: Hypothesis test proceeds similarly to the one sample case. Construction of the test statistic depends not only on whether the population variance is known, but also on if the population variances are equal. 11 1 Review Notation Let X 1, X,..., X n1 denote random variables from a distribution with mean µ 1 and standard deviation σ 1 ; Y 1, Y,..., Y n denote random variables from a distribution with mean µ and standard deviation σ. Let x 1, x,..., x n1 denotes a sample from the distribution with mean µ 1 and standard deviation σ 1 ; y 1, y,..., y n denote a sample from the distribution with mean µ and standard deviation σ. Question of Interest: Given the sample data, test H 0 : µ 1 = µ against some alternative. Recall and Note Recall that if X 1, X,..., X n1 denote i.i.d. random variables with mean µ 1 and standard deviation σ 1, X µ 1 n1 σ 1 Note that if X 1, X,..., X n1 denote i.i.d. random variables with mean µ 1 and standard deviation σ 1, and Y 1, Y,..., Y n denote i.i.d. random variables with mean µ and standard deviation σ, X Ȳ ) N µ 1 µ, σ 1 + σ and so X Ȳ ) µ 1 µ ) σ 1 + σ

13 14 Let x 1, x,..., x n1 denotes a sample from a distribution with mean µ 1 and standard deviation σ 1 ; y 1, y,..., y n denote a sample from a distribution with mean µ and standard deviation σ. Question of Interest: Given the sample data, test H 0 : µ 1 = µ against some alternative. Hypothesis Testing for a Difference Between Two Population Means Two-sided : H 0 : µ 1 µ = µ 1 µ ) 0 versus H A : µ 1 µ µ 1 µ ) 0 One-sided: H 0 : µ 1 µ µ 1 µ ) 0 versus H A : µ 1 µ > µ 1 µ ) 0 or H 0 : µ 1 µ µ 1 µ ) 0 versus H A : µ 1 µ < µ 1 µ ) 0 15 16 Test Statistics Populations with Known Variances Let X 1, X,..., X n1 denote i.i.d. random variables with mean µ 1 and standard deviation σ 1, and Y 1, Y,..., Y n denote i.i.d. random variables with mean µ and standard deviation σ. Then, Populations with Unknown but Equal Variances Let X 1, X,..., X n1 denote i.i.d. random variables with mean µ 1 and standard deviation σ 1, and Y 1, Y,..., Y n denote i.i.d. random variables with mean µ and standard deviation σ. Assume that we know σ 1 = σ = σ, but we do not know the value of σ. To test To test X Ȳ ) µ 1 µ ) σ 1 + σ H 0 : µ 1 µ = 0 against some alternative, use the test statistic H 0 : µ 1 µ = 0 against some alternative, use the test statistic Z = Under the null hypothesis that µ 1 a standard normal distribution. X Ȳ ) 0) σ 1 + σ = µ, Z has approximately) Evaluate the test statistic and compute the p-value using the table for a standard normal distribution. T = Under the null hypothesis that µ 1 X Ȳ ) 0) sp + sp = µ, T has a t-distribution approximately) with + degrees of freedom. Evaluate the test statistic and compute the p-value using the table for a t-distribution with + degrees of freedom.

17 Estimation of the common variance. The pooled estimate of the variance, s p, combines information from both of the samples to produce a more reliable estimate of σ. where s p = 1)s 1 + 1)s + and s 1 = i=1x i x) 1 s = i=1y i ȳ) 1