Lecture 4: Statistical Hypothesis Testing

Size: px

Start display at page:

Download "Lecture 4: Statistical Hypothesis Testing"

Hugo Damon Paul
5 years ago
Views:

1 EAS31136/B9036: Statistics in Earth & Atmospheric Sciences Lecture 4: Statistical Hypothesis Testing Instructor: Prof. Johnny Luo

Dates Topic Reading (Based on the 2 nd Edition of Wilks book) Other Activity Aug 31 Introduction; Review of probability Wilks, Chap 2 Pre-test Sep 7 Matlab tutorial (optional) Sep 14 Review of

2 Dates Topic Reading (Based on the 2 nd Edition of Wilks book) Other Activity Aug 31 Introduction; Review of probability Wilks, Chap 2 Pre-test Sep 7 Matlab tutorial (optional) Sep 14 Review of probability; Probability Distribution 1 Wilks, Chap 2, 3 Sep 21 Probability Distribution 2 Wilks, Chap 3, 4 Sep 28 Hypothesis testing Wilks, Chap 5 Oct 5 Linear regression I Wilks Chap 6; von Storch 8-9 Oct 12 Linear regression II Wilks Chap 6; von Storch 8-9 Oct 19 Time series analysis I Wilks 8; von Storch Oct 26 Midterm; discussion of final project Project 1-page abstract due Nov 2 Time series analysis II Wilks 8; von Storch Nov 9 Principal Component Analysis & Empirical orthogonal functions I Wilks 11; von Storch 13 Nov 16 Principal Component Analysis & Empirical orthogonal functions II Wilks 11; von Storch 13 Project progress report due Nov 30 Cluster analysis Wilks 14 Dec 7 Final project presentation

3 Outline 1. How to test hypotheses in statistics? 2. Commonly-used test 1: t test 3. Commonly-used test 2: χ 2 test

4 Definition of Terms Ø Statistic (singular): a single measure of some attribute of a sample (e.g., the mean value of 100 readings of January temperatures) Ø Sampling distribution: probability distribution of a statistic based on a batch of samples. For example, the sample mean of Jan temperature (in Ithaca) has a probability distribution. The true mean is just a single number (which we don t know and can be estimated only through observations). Ø Statistical hypotheses Testing (aka, significance testing): test a hypothesis concerning the sampled statistic versus the modeled distribution (e.g., a parametric distribution).

5 Example 1. Suppose that advertisements for a tourist resort in the sunny desert southwest claim that, on average, six days out of seven are cloudless during winter. To verify this claim, we would need to observe the sky conditions in the area on a number of winter days (sampled statistic), and then compare the fraction observed to be cloudless with the claimed proportion of 6/7= Assume that we could arrange to take observations on 25 independent occasions. If cloudless skies are observed on 15 of those 25 days, is this observation consistent with the claim? How do we make a statistical statement to test to what extent the hypothesis made by the resort is true?

6 Procedure concerning a hypothesis testing Ø Identify a test statistic (e.g., the sample mean) Ø Define a null hypothesis (denoted H 0 ). Oftentimes the null hypothesis will be something we hope to reject. Ø Define an alternative hypothesis (denoted H A ). Oftentimes H A is the complement of H 0 (i.e., H 0 is not true). Ø Obtain the null distribution (i.e., the sample distribution of the test statistic), given that H 0 is true. Ø Compare the observed test statistic to the null distribution. If it falls within an improbable region of the null distribution, we reject H 0.

7 Procedure concerning a hypothesis testing Ø Identify a test statistic (e.g., the sample mean) Ø Define a null hypothesis (denoted H 0 ). Oftentimes the null hypothesis will be something we hope to reject. Ø Define an alternative hypothesis (denoted H A ). Oftentimes H A is the complement of H 0 (i.e., H 0 is not true). Ø Obtain the null distribution (i.e., the sample distribution of the test statistic), given that H 0 is true. Ø Compare the observed test statistic to the null distribution. If it falls within an improbable region of the null distribution, we reject H 0.

8 Example 1. Suppose that advertisements for a tourist resort in the sunny desert southwest claim that, on average, six days out of seven are cloudless during winter (i.e., the claimed cloudless proportion: 6/7 = 0.857). Observations of 25 independent days shows that cloudless skies are observed on 15 of those 25 days. This fits a binomial distribution (it s like flipping a tricky coin 25 times and we get 15 heads). The test statistic = number of cloudless days. H 0 : resort s ads is correct that p = (we want to shoot it down) H A : p < (we hope to prove this) Null distribution: a binomial with N=25 and p=0.857 (as claimed by the ads). Test the probability of the observed statistic (X=15): ( ) ( ) =

9 plot(0:25,binopdf(0:25,25,0.857), o- ) ( ) ( ) = N = 25, p = # of cloudless days

10 Test Level and p Value Ø Test level: in the null distribution (assuming null hypothesis is true), we define a region or a level where probability is sufficiently small. For example, 5% level basically means only 5% or less chance of occurrence for null hypothesis to hold. Ø The p value: the probability of the observed value of the test statistic will occur in the null distribution. Ø The null hypothesis is rejected if p test level.

Example 1. Suppose that advertisements for a tourist resort in the sunny desert southwest claim that, on average, six days out of seven are cloudless during winter (i.e., the claimed cloudless proportion: 6/7 = 0.

11 Example 1. Suppose that advertisements for a tourist resort in the sunny desert southwest claim that, on average, six days out of seven are cloudless during winter (i.e., the claimed cloudless proportion: 6/7 = 0.857). Observations of 25 independent days shows that cloudless skies are observed on 15 of those 25 days. H 0 : resort s ads is correct that cloudless probability = H A : p < (we hope to prove this) Null distribution: a binomial with N=25 and p= Test the probability of the observed statistic: ( ) ( ) = This is our P-value

12 Some commonly-used parametric tests Ø One sample t test Ø Two sample t test: test for differences b/w two mean Ø χ-square test

13 One sample t test H 0 : the observed sample mean centers at some specified or assumed value, μ 0. Test statistic: The probability distribution for t is practically the same as that of the standardized Gaussian distribution (so we can use Table B.1), except that the variance in the denominator is n times smaller than the sample variance. where s 2 is the sample variance

14 One sample t test H 0 : the observed sample mean centers at some specified or assumed value, μ 0. Test statistic: where s 2 is the sample variance T-P-S: Why would the sample mean have smaller variance?

15 where s 2 is the sample variance Variance of the sample mean Vs variance of the raw sample data

16 Two sample t test: test for differences b/w two means (assuming independence) H 0 : the difference between the two means is zero Test statistic: where So,

17 Two-sided tests For Gaussian distribution, we usually adopt two-sided tests. For example, the test level of 5% means we check if the test statistic is smaller than the 2.5% of the left tail or greater than 1-2.5%=97.5% of the right tail.

18 Example 2. Test the hypothesis that the average max temperature at Ithaca and Canandaigua are significantly different H 0 : = 0 = 7.71^2/ ^2/31 = 3.91 Sqrt(3.91) = 1.98

19 Example 2. Test the hypothesis that the average max temperature at Ithaca and Canandaigua are significantly different z = ( )/1.98 = H 0 : = 0 = 7.71^2/ ^2/31 = 3.91 Sqrt(3.91) = 1.98

20 This is the p-value: 16.8%. If we set the test level at 5%, this means the p-value didn t pass the test. In other words, H 0 is not rejected and the max temp. at the two locations are NOT significantly different from zero.

21 Two sample t test: test for differences b/w two means for paired samples (assuming some dependence) For paired samples with some dependence (e.g., Ithaca temperature is probably correlated with temperature of Canandaigua), the variance is smaller than when the two samples are independent. So, p-value < 5% or even 1% test level

22 Some commonly-used parametric tests Ø One sample t test Ø Two sample t test: test for differences b/w two mean Ø χ-square test

23 χ 2 test: test goodness-of-fit in parametric distribution H 0 : the observed data fit well with the hypothesized distribution # here is the occurrence frequency of each class. Basically, we chop data into discrete (and MECE) trunks.

24 χ 2 test: test goodness-of-fit in parametric distribution H 0 : the observed data fit well with the hypothesized distribution If χ 2 is large, it means the difference b/w observed data and hypothesized distribution is large not a good fit!

25 The statistic χ 2 follow the χ 2 distribution, which is the sum of the squares of k independent standard normal random variables. PDF If the fit is good, #Observed should be very similar to #Expected, so that χ 2 is small.

26 PDF The statistic χ 2 follow the χ 2 distribution, which is the sum of the squares of k independent standard normal random variables. CDF If the fit is good, #Observed should be very similar to #Expected, so that χ 2 is small.

27 χ 2 test: test goodness-of-fit in parametric distribution H 0 : the observed data fit well with the hypothesized distribution Test of goodness-of-fit using χ 2 distribution is different from the t-test in that we wish to prove the H 0 (i.e., the fit is good).

28 Procedure for χ 2 test Ø Identify a test statistic (χ 2 ) Ø Define a null hypothesis (denoted H 0 ; good fit or small χ 2 ). Here, we wish to accept H 0. Ø Define an alternative hypothesis (denoted H A ): not a good fit or χ 2 is large Ø Obtain the null distribution : the standard χ 2 distribution. Ø Compare the observed test statistic (χ 2 value calculated) to the null distribution (the standard χ 2 distribution).

31 Χ 2 Distribution Table Degree of freedom ν = for the example here. 6 refers to six classes; 2 is the number of parameters for each distribution.

32 Χ 2 Distribution Table Degree of freedom ν = for the example here. 6 refers to six classes; 2 is the number of parameters for each distribution.

Lecture 5: Linear Regression

EAS31136/B9036: Statistics in Earth & Atmospheric Sciences Lecture 5: Linear Regression Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo Dates Topic Reading (Based on the 2 nd Edition of Wilks book)