STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

Similar documents
STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Summary of Chapters 7-9

Chapter 9: Hypothesis Testing Sections

Lecture 15. Hypothesis testing in the linear model

Central Limit Theorem ( 5.3)

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Comparison of Two Population Means

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Chapter 7 Comparison of two independent samples

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Non-parametric Inference and Resampling

Business Statistics. Lecture 10: Course Review

Hypothesis Testing hypothesis testing approach

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

INTERVAL ESTIMATION AND HYPOTHESES TESTING

1 Statistical inference for a population mean

Sampling Distributions

Lecture 10: Generalized likelihood ratio test

STATISTICS 4, S4 (4769) A2

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Distribution-Free Procedures (Devore Chapter Fifteen)

S D / n t n 1 The paediatrician observes 3 =

MATH4427 Notebook 5 Fall Semester 2017/2018

Probability and Statistics Notes

Chapter 3. Comparing two populations

Two Sample Problems. Two sample problems

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Statistics: revision

Sampling Distributions

Statistical Hypothesis Testing

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Session 3 The proportional odds model and the Mann-Whitney test

Comparison of Two Samples

7 Random samples and sampling distributions

Chapter 7: Special Distributions

Topic 22 Analysis of Variance

Sociology 6Z03 Review II

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Statistics Handbook. All statistical tables were computed by the author.

STA442/2101: Assignment 5

Lab #12: Exam 3 Review Key

Lecture 14: ANOVA and the F-test

Ch 2: Simple Linear Regression

Chapter 9 Inferences from Two Samples

Non-parametric (Distribution-free) approaches p188 CN

14.30 Introduction to Statistical Methods in Economics Spring 2009

Hypothesis Testing One Sample Tests

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Unit 9: Inferences for Proportions and Count Data

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

[Chapter 6. Functions of Random Variables]

Chapter 12. Analysis of variance

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Unit 9: Inferences for Proportions and Count Data

Paired comparisons. We assume that

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Math 152. Rumbos Fall Solutions to Exam #2

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 21: October 19

Correlation 1. December 4, HMS, 2017, v1.1

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Asymptotic Statistics-VI. Changliang Zou

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

7.3 The Chi-square, F and t-distributions

One sample problem. sample mean: ȳ = . sample variance: s 2 = sample standard deviation: s = s 2. y i n. i=1. i=1 (y i ȳ) 2 n 1

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

Nonparametric hypothesis tests and permutation tests

Rank-Based Methods. Lukas Meier

The Components of a Statistical Hypothesis Testing Problem

STA 2101/442 Assignment 3 1

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

1 Hypothesis testing for a single mean

Institute of Actuaries of India

Scatter plot of data from the study. Linear Regression

STAT 512 sp 2018 Summary Sheet

5 Introduction to the Theory of Order Statistics and Rank Statistics

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Multiple Regression Analysis

Transcription:

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples. Rebecca Barter March 16, 2015

The χ 2 distribution

The χ 2 distribution We have seen several instances of test statistics which follow a χ 2 distribution, for example, the generalized likelihood ratio, Λ 2 log(λ) χ 2 df and the Pearson χ 2 goodness-of-fit test statistic X 2 = n (O i E i ) 2 E i i=1 χ 2 n 1 dim(θ) It turns out that the χ 2 distribution is very related to the N(0, 1) distribution.

The χ 2 distribution The χ 2 distribution can be generated from the sum of squared normals: If Z N(0, 1), then Z 2 χ 2 1 and more generally, if Z i N(0, 1), then Z 2 1 +... + Z 2 n χ 2 n from which it is easy to conclude that if U χ 2 n and V χ 2 m, then U + V χ 2 m+n that is, to get the distribution of the sum of χ 2 random variables, just add the degrees of freedom.

The χ 2 distribution The χ 2 n distribution is also related to the gamma distribution: ( n U χ 2 n U Gamma 2, 1 ) 2 so we can write the density of the χ 2 n distribution as f (x) = 1 2 n/2 Γ( n 2 )x n 2 1 e x 2, x 0

The t distribution

The t distribution If Z N(0, 1) is independent of U χ 2 n, then Z U n t n The t n distribution has density f n (x) = ( ) (n+1)/2 Γ((n + 1)/2) 1 + t2 nπγ(n/2) n

The t distribution Note that if X i N(µ, σ 2 ), then X n µ σ/ n N(0, 1) However, if we don t know σ, we can estimate it by s 2 = 1 n 1 and our distribution becomes n (X i X ) 2 i=1 X n µ s/ n t n 1

The F -distribution

The F-distribution If U χ 2 m and V χ 2 n are independent, then The F m,n distribution has density f (w) = F = U/m V /n F m,n Γ((m + n)/2) ( m ) ( m/2 w m/2 1 1 + m ) (m+n)/2 Γ(m/2)Γ(n/2) n n w

Moment generating functions

Moment generating functions The moment generating function (MGF) for a random variable X is given by M(t) = E (e ) tx and the MGF of a random variable uniquely determines the corresponding distribution.

Moment generating functions The MGF of X is defined by M(t) = E (e ) tx A useful formula: The kth derivative of the MGF of X, evaluated at zero, is equal to the kth moment of X : E ( X k) = d k M dt k t=0

Moment generating functions Some nice properties of the moment generating function (show them yourself!): If X has MGF M X (t), and Y = a + bx, then Y has MGF M Y (t) = e at M X (bt) If X and Y are independent with MGF s M X and M Y and Z = X + Y, then M Z (t) = M X (t)m Y (t)

Exercise

Exercise: moment generating functions For each of the following distributions, calculate the moment generating function, and the first two moments. 1. X Poisson(λ) 2. Z = X + Y, where X Poisson(λ) and Y Poisson(µ) are independent 3. Z N(0, 1)

Comparing Independent Samples from two normal populations with common but unknown variance t-test

Comparing independent samples So far, we have been working towards making inferences about parameters from a single population. For example, we have looked at conducting hypothesis tests for the (unknown) population mean µ, such as H 0 : µ = 0 H 1 : µ > 0 Given a sample X 1,..., X n from our population of interest, we would look at the sample mean, X n, to see if we have enough evidence against H 0 in favor of H 1 (for example, by calculating a p-value)

Comparing independent samples What if we were instead interested in comparing parameters from two different populations? Suppose that: the first population has (unknown) mean µ 1, and the second population has (unknown) mean µ 2. Then we might test the hypothesis: against H 0 : µ 1 = µ 2 H 1 : µ 1 > µ 2 or H 1 : µ 1 < µ 2 or H 1 : µ 1 µ 2

Comparing independent samples from two normal populations with common but unknown variance Suppose that we have observed samples: X 1 = x 1,..., X n = x n from pop 1 with unknown variance σx 2 Y 1 = y 1,..., Y m = y m from pop 2 with unknown variance σy 2 Assume common variance: σx 2 = σ2 Y We might want to test against H 0 : µ 1 = µ 2 H 1 : µ 1 > µ 2 What test statistic might we look at to see if we have evidence against H 0 in favor of H 1?

Comparing independent samples from two normal populations with common but unknown variance If H 0 was true, we might expect X n Ȳm 0 if H 1 was true, we might expect X n Ȳm > 0. Note that our p-value (the probability, assuming H 0 true, of seeing something at least as extreme as what we have observed) would be given by P H0 ( X Ȳ > x ȳ) To calculate this probability, we need to identify the distribution of X Ȳ.

Comparing independent samples from two normal populations with common but unknown variance It turns out that t = ( X n Ȳm) s 1 n + 1 m t n+m 2 where the denominator is an estimate of the standard deviation of X Ȳ (since the true variance is unknown), and s 2 = (n 1)s2 X + (m 1)s2 Y n + m 2 is the pooled variance of X 1,..., X n and Y 1,..., Y m

Comparing independent samples from two normal populations with common but unknown variance To test H 0 : µ 1 = µ 2 against H 1 : µ 1 > µ 2 Our p-value can be calculated by P H0 ( X Ȳ > x ȳ) = P ( X Ȳ ) ( x ȳ) > 1 s n + 1 1 m s n + 1 m = P t n+m 2 > ( x ȳ) 1 s n + 1 m This test is called a two-sample t-test

Exercise

Exercise: comparing independent samples from two normal populations with common but unknown variance Suppose I am interested in comparing the average delivery time for two pizza companies. To test this hypothesis, I ordered 7 pizzas from pizza company A and recorded the delivery times: (20.4, 24.2, 15.4, 21.4, 20.2, 18.5, 21.5) and 5 pizzas from pizza company B: (20.2, 16.9, 18.5, 17.3, 20.5) I know that the delivery times follow normal distributions, but I don t know the mean or variance. Do I have enough evidence to conclude that the average delivery times for each company are different?

A nonparametric test for comparing Independent Samples from two arbitrary populations Mann-Whitney test

Comparing Independent Samples from two arbitrary populations Suppose that we are interested in comparing two populations, but we don t have any information about the distribution of our two populations. We observe data X 1 = x 1,..., X n = x n IID with unknown cts distribution F Y 1 = y 1,..., Y m = y m IID with unknown cts distribution G Suppose that we want to test the null hypothesis H 0 : F = G We could test H 0 using a nonparametric test (a test that makes no distributional assumptions on the data) based on ranks.

Comparing independent samples from two arbitrary populations Let R i be the rank of X i (when the ranks are taken over all of the X i s and Y i s combined). What is the expected value of n i=1 R i under H 0? First note that since under H 0 all ranks are equally likely, P(R 1 = k) = 1 n + m Thus E(R 1 ) = 1P(R 1 = 1) + 2P(R 1 = 2) +... + (n + m)p(r 1 = n + m) = 1 (1 + 2 +... + (n + m)) n + m = 1 ( ) (n + m)(n + m + 1) n + m 2 = n + m + 1 2

Comparing independent samples from two arbitrary populations We just showed that E(R 1 ) = n + m + 1 2 So that if H 0 : F = G is true, we have ( n ) n(n + m + 1) E R i = 2 i=1 Thus a value of n i=1 R i that is significantly larger or smaller than n(n+m+1) 2 would provide evidence against H 0 : F = G.

Example: Comparing independent samples from two arbitrary populations Suppose, for example, that we had X = (6.2, 3.7, 4.7, 1.3) Then our ranks are Y = (1.2, 0.8, 1.4, 2.5, 1.1) rank(x ) = (9, 7, 8, 4) rank(y ) = (3, 1, 5, 6, 2) and we have that the sum of the ranks of the X i s and Y i s are n sum of rank(x ) := R i = 28 i=1 m sum of rank(y ) = R i = 17 i=1

Comparing independent samples from two arbitrary populations In our example, we saw that the sums of the ranks of the X i s was larger than the sums of the ranks of the Y i s. But how can we tell if this difference is enough to conclude that H 0 : F = G is false (that the two samples came from different distributions)?

Comparing independent samples from two arbitrary populations Suppose that the rank of X i is R i the rank of Y i is R i Note that under H 0, if n = m, we would expect that Recall that n R i = i=1 ( n ) E R i = i=1 m i=1 R i n(n + m + 1) 2 similarly we should have that the expected sum of the R i s is ( m ) E R i m(n + m + 1) = 2 i=1

Comparing independent samples from two arbitrary populations Thus if H 0 were true, we would expect that n R i + i=1 m R i n 1 (n + m + 1) i=1 where n 1 is the sample size of the smaller population: n 1 = min(n, m). Based on this idea, we can use the Mann-Whitney test to test H 0.

Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1. Concatenate the X i and Y j into a single vector Z 2. Let n 1 be the sample size of the smaller sample 3. Compute R = sum of the ranks of the smaller sample in Z 4. Compute R = n 1 (m + n + 1) R 5. Compute R = min(r, R ) 6. Compare the value of R to critical values in a table: if the value is less than or equal to the tabulated value, reject the null that F = G

Example: Comparing independent samples from two arbitrary populations Back to our example, where X = (6.2, 3.7, 4.7, 1.3) rank(x ) = (9, 7, 8, 4) Y = (1.2, 0.8, 1.4, 2.5, 1.1) rank(y ) = (3, 1, 5, 6, 2) Here n 1 = 4. The smaller sample is X and so the sum of the ranks of X are: R = sum of rank(x ) := n R i = 28 i=1 Next, we compute R = n 1 (m + n + 1) R = 4 (5 + 4 + 1) 28 = 12

Example: Comparing independent samples from two arbitrary populations so that R = sum of rank(x ) := n R i = 28 i=1 R = n 1 (m + n + 1) R = 4 (5 + 4 + 1) 28 = 12 R = min(r, R ) = 12 The critical value in the table for a two-sided test, with α = 0.05 (n 1 = 4, n 2 = 5) is 11. Thus, our value is not less than or equal to the critical value, so we fail to reject H 0.

Exercise

Exercise: Rice Chapter 11, Exercise 21 A study was done to compare the performances of engine bearings made of different compounds. Ten bearings of each type were tested. The times until failure (in units of millions of cycles) for each type are given below. Type I 3.03 5.53 5.60 9.30 9.92 12.51 12.95 15.21 16.04 16.84 Type II 3.19 4.26 4.47 4.53 4.67 4.69 12.78 6.79 9.37 12.75 1. Use normal theory to test the hypothesis that there is no difference between the two types of bearings. 2. Test the same hypothesis using a nonparametric method. 3. Which of the methods that of part (a) or that of part (b) do you think is better in this case?