Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Size: px

Start display at page:

Download "Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):"

Thomas French
5 years ago
Views:

1 Lecture Three Normal theory null distributions Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT): A random variable which is a sum of many independent random variables will have an (approximately) normal distribution Examples (1) Many natural responses may be modelled as the additive effect of many factors eg crop yield: where y 1 = a 1 x seed1 + a 2 x soil1 + a 3 x water1 + y 2 = a 1 x seed2 + a 2 x soil2 + a 3 x water2 + y n = a 1 x seedn + a 2 x soiln + a 3 x watern + x seed1,, x seedn are independent samples from a (not necessarily normal) distribution with mean µ seed and variance σ 2 seed ; x soil1,, x soiln are independent samples from a distribution with mean µ soil and variance σ 2 soil ; x water1,, x watern are independent samples from a distribution with mean µ water and variance σ water 2 ; it then follows that (y 1,, y n ) will be independent samples from an approximately normal joint distribution with µ Y = a 1 µ seed + a 2 µ soil + a 3 µ water + σ 2 Y = a 2 1σ 2 seed + a2 2σ 2 soil + a2 3σ 2 water + additive effects normally distributed data (2) The sampling distribution for Ȳ from independent samples from a population Recap: sampling distribution: 1

2 Population A sample (y (1) 1,, y(1) n ) ȳ (1) Population A sample (y (2) 1,, y(2) n ) ȳ (2) Population A sample (y (N) 1,, y n (N) ) ȳ (N) Distribution of (ȳ (1),, ȳ (N) ) is called the sampling distribution of the sample mean Ȳ For reasonable distributions (finite mean µ and variance σ 2 ) and non-tiny sample sizes (n > 30), Ȳ will have an approximately normal distribution, with mean µ, variance σ 2 /n Why do we care about the sampling distribution of Ȳ? Consider H 0 : E(Y A ) = E(Y B ) = µ (treatment has no effect) Then regardless of the distribution of the data, under H 0 we have hence Ȳ A N(µ, σ 2 /n A ) Ȳ B N(µ, σ 2 /n B ) Ȳ A ȲB N(0, (σ 2 /n A ) + (σ 2 /n B )) represents a distribution of hypothetical results of a sampling experiment that it could have occurred under H 0 (Review of) properties of the Normal Distribution (I) (1) If Y N(µ, σ 2 ), then ay +b N(aµ+b, a 2 σ 2 ); in particular, (Y µ)/σ N(0, 1) N(0, 1) is called the standard Normal distribution (2) If Y 1 N(µ 1, σ 2 1) and Y 2 N(µ 2, σ 2 2) and Y 1, Y 2 independent then Y 1 + Y 2 N(µ 1 + µ 2, σ σ 2 2) (3) If Y 1,, Y n are an iid sample from N(µ, σ 2 ) then independent of S 2 = 1 n (Y i n 1 Ȳ )2 Ȳ is statistically 2

3 Normal theory Null Distributions Consider the simple hypothesis test: H 0 : E(Y ) = µ 0 H 1 : E(Y ) µ 0 (two-sided) obtain data y 1,, y n, evaluate H 0, H 1 with test statistic: d(y) = d(y 1,, y n ) = ȳ µ 0 σ/ n 1 If data are from a normal population N(µ, σ 2 ) then d(y) = (Ȳ µ 0) (σ/ n) Ȳ µ 0 N(µ µ 0, σ 2 /n) and thus if H 0 is true then d(y) N(0, 1) N( µ µ 0 σ/ n, 1) Thus if the null hypothesis is true then we would expect d(y) to have a standard normal distribution, eg in 95% of samples taking a value between 196 and +196 Equivalently: the null distribution is standard normal 2 If data are not normal, but H 0 is true, the variance is σ 2, and n is fairly big (eg n > 30), then by the CLT we have: Hypothesis Test d(y) = (Ȳ µ 0) (σ/ n) N(0, 1) Large values of d(y) provide strong evidence against H 0 Reject H 0 for larger values of d(y) p-value = P r( d(y) d(y obs ) ) = P r(z d(y obs ) ) + P r(z d(y obs ) ) = 2P r(z d(y obs ) ) (here we use Z to indicate a standard normal RV) Problem: σ 2 is usually unknown = 2 (1 pnorm( d(y obs ), 0, 1)) Thus we cannot compute d(y) (in this sense it is not a genuine test-statistic) 3

4 Solution: approximate σ 2 with s 2, the sample variance, and use t(y) = ȳ µ 0 s/ n Q: What is the distribution of t(y) under H 0 : E(Y ) = µ 0? Well, s σ, hence we might hope that: ȳ µ 0 s/ n ȳ µ 0 σ/ N(0, 1) n For very large samples (> 100) this is a reasonable approximation, but for less large samples we need to take into account that S varies around σ Properties of the Normal distribution (2): χ 2 and t-distributions (4) If Z 1,, Z n iid N(0, 1) then n Zi 2 χ 2 n the χ 2 ( Chi-squared ) distribution with n df (degrees of freedom) In particular, if Y 1,, Y n iidn(µ, σ 2 ), then Y 1 µ, Y 2 µ, Y n µ σ σ σ iid N(0, 1) 1 n σ 2 (Y i µ) 2 χ 2 n n 1 σ 2 S 2 = n 1 1 σ 2 n 1 n (Y i Ȳ )2 χ 2 n 1 Note: the df is the number of independent normal RVs that are summed and squared, or equivalently, the dimension of the space in which these variables live: Clearly Yi µ σ is independent of Yj µ σ ; But Yj Ȳ σ is not independent of Y j Ȳ σ, for j j However, if we define W i = Y i Ȳ then 1 σ 2 n (Y i Ȳ )2 = 1 n σ 2 W 2 i = 1 n 1 σ 2 U 2 i 4

5 where for i = 1,, n 1: U i = i i + 1 W i i(i + 1) i j=1 W j and U j, U j are independent for j j Thus we see that n 1 is indeed the correct df for S 2 (5) If Z N(0, 1), X χ 2 k and Z and X are independent then Z X/k t k Student s t-distribution on k df ( Student was a pseudonym used by W Gosset) Note that as k t k N(0, 1) distribution The t-distribution has heavier tails than the standard normal, ie for large values of x: where Z N(0, 1) and T t k Back to the hypothesis test: P ( Z > x) < P ( T > x) We now return to the question: if E(Y ) = µ 0, so H 0 is true, and in addition, Y 1, Y n iid N(µ 0, σ 2 ), what is the distribution of t(y) = Ȳ µ 0 S/ n? We already showed that if Y 1, Y n iid N(µ 0, σ 2 ) (Ȳ µ 0) (σ/ n) Further, from (4) it follows that N(0, 1) Z n 1 σ 2 S 2 χ 2 n 1 X with k = n 1 Finally from (3) we have that Ȳ and S2 are independent Hence Z and X are independent Thus it follows that Ȳ µ 0 S/ n = Ȳ µ 0 σ/ n n 1 σ 2 S2 n 1 = Z X /(n 1) t n 1 Note that neither X nor Z are test statistics, since (unless we know σ 2 ) they cannot be computed from the sample 5

6 One sample t-test H 0 : E(Y ) = µ 0 H 1 : E(Y ) µ 0 (two-sided) Assumption: if H 0 is true then Y 1,, Y n iid N(µ 0, σ 2 ) Data y 1,, y n Test statistic: t obs = t(y) = (ȳ ) µ0 n s Null distribution: from the assumptions, if H 0 is true then t(y) t n 1, ie the population of hypothetical values of t(y) that might have been sampled is a t-distribution on n 1 df p-value: measuring evidence against H 0 : p = P r ( t(y) t obs H 0 ) = 2 (1 pt( tobs, n 1)) You can also try ttest(datavector, mu=µ 0 ) Basic Decision Theory Goal: Reject H 0 when false, accept H 0 when true Method: Judge evidence provided by data against H 0 Truth Decision H 0 true H 0 false Accept H 0 : Correct Type II error Reject H 0 : Type I error Correct Alternate terminology: H 0 : no treatment effect / you don t have the disease H 1 : treatment effect / you do have the disease then Type I errors correspond to False positives; Type II errors correspond to False negatives Note: A type I error can only occur when the null hypothesis is true Conversely, a type II error can only occur when the alternative is true 6

7 p-value Decision Procedure: (i) Select a level α (0 < α < 1) (ii) Compute the observed value of the statistic, and hence the p-value (iii) Reject H 0 if p-value α; accept H 0 if p-value> α Note: A large p-value does not imply that the alternative hypothesis is false, merely that there is little evidence against the null hypothesis For this reason some people speak of failing to reject H 0, rather than accepting H 0 ; the implication being that the null hypothesis has escaped rejection this time, but might not be so lucky next time Example: One sample t-test H 0 : E(Y ) = µ 0 H 1 : E(Y ) µ 0 (two-sided) Let t obs be the observed value of the t-statistic t(y) = n(ȳ µ 0 )/s Let T n 1 be a random variable with a Student t-distribution on n 1 df reject H 0 iff p-value α P r( T n 1 t obs ) α 2 P r(t n 1 t obs ) α P r(t n 1 t obs ) α/2 1 P r(t n 1 t obs ) α/2 P r(t n 1 t obs ) 1 α/2 Let t n 1,1 α/2 be the 1 α/2 quantile of the t-distribution with n 1 df ie P r(t n 1 < t n 1,1 α/2 ) = 1 α/2 Let x 1, x 2, x 3 be three numbers such that (picture) x 1 < t n 1,1 α/2 < x 2 < t n 1,1 α/2 < x 3 If t obs = x 1 then P r(t n 1 x 1 ) > 1 α/2, hence p-value < α, so we reject H 0 If t obs = x 2 then P r(t n 1 x 2 ) < 1 α/2, hence p-value > α, so we accept H 0 If t obs = x 3 then P r(t n 1 x 3 ) < 1 α/2, hence p-value < α, so we reject H 0 Thus we see that for a one sample t-test, the p-value decision procedure is equivalent to 7

8 t-statistic Decision Procedure: (i) Select a level α (0 < α < 1) (ii) Compute the observed t-statistic t obs = t(y) (iii) Reject H 0 if t obs t n 1,1 α/2 ; Accept H 0 if t obs < t n 1,1 α/2 Terminology The range of values of the test statistic where (for a given α) the null hypothesis is rejected is called the rejection region; conversely the range of values for which the null is not rejected is called the acceptance region It should now be obvious that for both decision procedures that P (type I error H 0 is true ) = P (reject H 0 H 0 is true ) = P ( t(y) t n 1,1 α/2 H 0 is true) = 2 α/2 = α Such a procedure is called a level-α test α controls the pre-experimental type-i error rate Note that we can construct a decision procedure to give any specified type I error rate Note: (Interpretation) If every experimenter used, for example, α = 005 then The null hypothesis would be falsely rejected for 5% of those experiments in which H 0 is true, and would correctly not be rejected for 95% of these experiments (where H 0 is true) What about those experiments where H 1 is true? For what proportion of these will we incorrectly accept H 0, and for what proportion will we correctly reject H 0? ie for such experiments what will our Type II error rate be? We will return to this question when we discuss power and the control of type II errors (It will depend on how H 1 is true, and our sample size) 8

Confidence intervals

Confidence intervals We now want to take what we ve learned about sampling distributions and standard errors and construct confidence intervals. What are confidence intervals? Simply an interval for which