Introduction to hypothesis testing

Size: px

Start display at page:

Download "Introduction to hypothesis testing"

Mark Cobb
5 years ago
Views:

1 Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If hypothesis (H A )is that an experimental treatment has an effect: null hypothesis is that there is no effect Disproving H 0 = evidence that actual hypothesis is true

2 Decision criterion How low a probability should make us reject H 0? If probability is less than significance level (critical p-value, ), then reject H 0 ; otherwise do not reject Convention sets significance level: = 0.05 (5%) Arbitrary: other significance levels might be valid. Context specific Three special types of Hypothesis Tests based on the t distribution 1. The mean of a distribution is different from a constant (one sample t test) 2. The mean difference in pairs of observations is different from a constant (paired t test) 3. Two distributions differ (i.e. the means from two sets of observations do not come from the same distribution of means). Two sample t test.

3 t statistic General form of t statistic: S t SE where S t is sample statistic, is parameter value specified in H 0 and SE is standard error of sample statistic. Specific form for population mean: y s n Value of mean specified in H 0 Test statistics Sampling distributions of t, one for each sample size, when H 0 true use degrees of freedom (df = n -1) Area under each sampling (probability) distribution equals one Probabilities of obtaining particular ranges of t when H 0 is true

4 Three special types of Hypothesis Tests based on the t distribution 1. The mean of a distribution is different from a constant. One sample t test 2. The mean difference in pairs of observations is different from a constant. Paired t test. 3. Two distributions differ (ie the means from two sets of observations do not come from the same distribution of means). Two sample t test. Simple null hypothesis Test of hypothesis that population mean equals a particular value (H 0 : = ) These values may be from literature or other research or legislation

5 One sample t-test Mean(B_To_D) Europe Islamic NewWorld Group Populations are fairly stable if the ratio of births to deaths is close to H o : B/D ratios = 1.25 H A : B/D ratios = ) Are the B/D ratios for any of these groups =1.25 2) Test using a one sample t-test Ourworld t statistic General form of t statistic: S t SE where S t is sample statistic, is parameter value specified in H 0 and SE is standard error of sample statistic. Specific form for population mean: y s n Value of mean specified in H 0

6 One sample t-tests Single population: H 0 : = 0 (or any other pre-specified value: here 1.25) t y 1.25 s y df = n -1 y 1.25 s n Results Europe Box plot 2. Normal approximation 3. Histogram Probability

More Results Islamic Test Mean Hypothesized Value Actual Estimate DF Std Dev Test Statistic Prob > t Prob > t Prob < t t Test 7.5570 <.0001* <.0001* 1.0000 1.25 3.47825 15 1.

7 More Results Islamic Test Mean Hypothesized Value Actual Estimate DF Std Dev Test Statistic Prob > t Prob > t Prob < t t Test <.0001* <.0001* New World Test Mean Hypothesized Value 1.25 Actual Estimate DF 20 Std Dev Test Statistic Prob > t Prob > t Prob < t t Test <.0001* <.0001* Even more a way to present the results 8 Births / deaths (95% CI) Ho:

8 Two sample t- test Used to compare two populations, each of which has been sampled The simplest form of tests among multiple populations Example: does the average annual income differ for males and females: Ho: income (males) = income (females) Female Male Survey2 SEX Calculation: H 0 : 1 = 2, i.e. 1-2 = 0 - independent observations t y 1 y2 ( 1 2) s y y 1 2 y 1 y s y y y 1 y s + p n 1 n 2 Where s p = the pooled standard deviation (more later), and df = (n 1-1) + (n 2-1) = n 1 + n 2-2

9 Logic of the two sample t test Assume H o : = 2 H A : > 2 1) If H o is true then the null distribution is known (for a set df) 2) If H A is true, we don t know the distribution but we do know that it is not the null distribution Probability of t H o true t = Central t s p y 1 H A true Non- Central t y n 1 n Assume: H o : = 2, 4 df H o true t 0.05, 4 df = t = s p y 1 y n 1 n 2 Any t >2.14 will lead to incorrect rejection of H o 1. This means that the difference between y 1 and y 2 is > than 2.14 standard errors (pooled) 2. This will happen 5 % of the time

10 Assume: H A : > 2, 4 df H A true t 0.05, 4 df = t = s p y 1 y n 1 n 2 Any t < 2.14 will lead to incorrect rejection of H A 1. This means that the difference between y 1 and y 2 is < than 2.14 standard errors (pooled) 2. The probability that this will happen is dependent on n and the true difference between and Results of example What is the conclusion? Difference in Means The unequal variance t-test is based on the Satterthwaite adjustment (of degrees of freedom), it is not recommended unless the variance terms are very different and the sample sizes (n) are very different Difference in Means

11 70 Female 70 Male Annual Income (mean +- SE) Female Male SEX Paired t tests: The logic of 1. Often there is interest in comparisons of observations that can be considered paired within a subject or replicate a) For example: i. A comparison of activity level before and after eating in the ii. same individual A comparison of longevity of males vs females,where county is the replicate 2. In such cases there is often benefit in accounting for variance that could be caused by differences among subjects (or replicates)

in two colors along the west coast: purple and orange: H o : density of purple per site = density of orange Individual reefs are the

12 Paired observations: Paired t- test H 0 : d = 0 where d is difference between between paired observations t d s d d s d n d Where s d = standard deviation of the sample of differences, and df = n - 1 where n is number of pairs Paired t-test example II Pisaster comes in two colors along the west coast: purple and orange: H o : density of purple per site = density of orange Individual reefs are the replicates of interest Looks like a no brainer Density Orange Purple COLOR Sea star colors all sites two sample

13 Results of a 2 sample test Standard GROUP N Mean Deviation Orange Purple Pooled Variance Difference in Means : % Confidence Interval : to t : df : p-value : Marginally significant WHY? NUMBER Density (95% CI) Count Count COLOR Orange Purple Orange Purple Color of seastars Consider the variability added at the level of replicate (site) Given that observations are paired at the level of site can this be accounted for Density 600 Density 600 Density Orange Purple COLOR Govpt Boat Stair Shell Beach Site Hazards Cayucos PSN Govpt Boat Stair Shell Beach SITE Hazards Cayucos PSN COLOR Orange Purple

Paired test: Details of calculation 1200 Site

585 155 430 Stair 476 143 333 PSN 233 142 91

49 14 35 mean 312.5714 Sediff 97.25882 t 3.

of Case PURPLE Note slopes are they the same:

to rates or 2) Log transform Paired test: Details

Purple(log) Orange(log) difference Govpt 3.

1903317 0.576824 Stair 2.677607 2.155336 0.

0293838 1.4913617 0.538022 Hazards 2.8621314 2.

544068 Value 3.5 3.0 2.5 2.0 1.5 mean 0.

53299 LORANGE LPURPLE Index of Case Note slopes

14 Paired test: Details of calculation 1200 Site Purple Orange difference Govpt Boat Stair PSN Cayucos Hazards Shell Beach mean Sediff t Value ORANGE Index of Case PURPLE Note slopes are they the same: Perhaps rates are a better comparison 1) Convert to rates or 2) Log transform Paired test: Details of calculation: use of Log transformed data Site Purple(log) Orange(log) difference Govpt Boat Stair PSN Cayucos Hazards Shell Beach Value mean Sediff t LORANGE LPURPLE Index of Case Note slopes much more similar Indicates that: 1) Purples are more common By a constant ratio rather than by a constant amount

15 Review calculations of t for One sample test y s n Two sample test Paired test y 1 y 2 s 1 p s p 1 + n 1 d s d n 2 n d Calculations of Standard Error 1) One sample t-test s n 2 S = SS (n-1) 2) Paired t-test s d n d 2 S d = SS d (n d -1) 3) Two sample t- test (calculation based on pooled variance term) 1 1 s p n 1 n S p = SS 1 +SS 2 (n 1-1)+(n 2-1) = SS 1 +SS 2 (n 1 +n 2-2)

16 Testing statistical null hypotheses Hypothesis construction

17 General Hypothesis A hypothesis that addresses the general question of interest H o : There will be no difference in the density of urchins on vertical vs horizontal surfaces H A : There will be a difference in the density of urchins on vertical vs horizontal surface Specific hypotheses A hypothesis that represents the specific question addressed in your study. The specifics include Location of study Time period Replication Simple description of design

18 Specific Hypothesis H o : There will be no difference in the density of (species name) on vertical vs horizontal surfaces based on 10 replicate quadrats for each treatment randomly placed within site A sampled on date B H A : There will be a difference in the density of (species name) on vertical vs horizontal surfaces based on 10 replicate quadrats for each treatment randomly placed within site A sampled on date B Note much of this can be placed in the methods section, which would alleviate the need to state these details. However, also note that the hypotheses above are actually what are being tested Depiction of hypotheses H o : There will be no difference in the density of (species name) on vertical vs horizontal surfaces based on 10 replicate quadrats for each treatment randomly placed within site A sampled on date B Increasing likelihood that Ho is incorrect Increasing likelihood that Ho is incorrect Horizontal Density Vertical Density of Urchins

19 Depiction of hypotheses: what should the units be? H o Increasing likelihood that Ho is incorrect Increasing likelihood that Ho is incorrect Horizontal Density Vertical Density of Urchins Depiction of hypotheses: what should the units be? Goal To use same units for all assessments irrespective of species or system To have same set of probabilities based on those units Hence - units should link to estimate of confidence Most common form are t-values, which provide an estimate of the difference in mean values calibrated by an estimate of error in the assessment of the mean values

20 T- statistic T X X SE 1 2 (Standard error) SE and SD SD N N i (Standard deviation) (Number of replicates) X X i 2 N X SD SE Depiction of hypotheses: what should the units be? H o Increasing likelihood that Ho is incorrect Increasing likelihood that Ho is incorrect T = Horizontal Density Vertical Density of Urchins SE

21 Depiction of hypotheses: what should the units be? H o Increasing likelihood that Ho is incorrect Increasing likelihood that Ho is incorrect T = Horizontal Density Vertical Density of Urchins SE T-distribution (central t) is a null probability distribution Depicts the probability that the null hypothesis is correct One use is to estimate confidence levels

22 Depiction of hypotheses: H o Increasing likelihood that Ho is incorrect Increasing likelihood that Ho is incorrect T = Horizontal Density Vertical Density of Urchins SE Depiction of hypotheses: what should the units be? H o Increasing likelihood that Ho is incorrect Increasing likelihood that Ho is incorrect T = Horizontal Density Vertical Density of Urchins SE

23 H o : There will be no difference in the density of urchins on vertical vs horizontal surfaces T = Horizontal Density Vertical Density of Urchins SE H o : There will be no difference in the density of urchins on vertical vs horizontal surfaces T = Horizontal Density Vertical Density of Urchins SE

24 H o : There will be no difference in the density of urchins on vertical vs horizontal surfaces Including error yields a confidence interval e.g. 95% confident that the true t value is between. 95% CI T = Horizontal Density Vertical Density of Urchins SE H A : There will be a difference in the density of urchins on vertical vs horizontal surface 100% CI 2.5% 95% CI 2.5% T = Horizontal Density Vertical Density of Urchins SE

25 The importance of directionality of the alternative hypothesis (H A ) Consider: H o : There will be no difference in the density of urchins on vertical vs horizontal surfaces H A : There will be a difference in the density of urchins on vertical vs horizontal surfaces vs H o1 : Urchin density on horizontal surfaces will be greater than or equal to that on vertical surfaces H A1 : Urchins will be more dense on vertical than on horizontal surfaces H o1 : Urchin density on horizontal surfaces will be greater than or equal to that on vertical surfaces 100% CI 5% 95% CI T = Horizontal Density Vertical Density of Urchins SE

26 H A1 : Urchins will be more dense on vertical than on horizontal surfaces 100% CI 5% 95% CI T = Horizontal Density Vertical Density of Urchins SE One vs two tailed hypotheses- 1. Which is more interesting? 2. Which is more informed? H A1 : Urchins will be more dense on vertical than on horizontal surfaces H A : There will be a difference in the density of urchins on vertical vs horizontal surface 100% CI 100% CI 5% 95% CI 2.5% 95% CI 2.5% T = Horizontal Density Vertical Density of Urchins SE

27 One vs two tailed hypotheses- 1. Which is more powerful? H A1 : Urchins will be more dense on vertical than on horizontal surfaces H A : There will be a difference in the density of urchins on vertical vs horizontal surface 100% CI 100% CI 5% 95% CI 2.5% 95% CI 2.5% T = Horizontal Density Vertical Density of Urchins SE Example T Replication on horizontal and vertical surfaces = 50 (100 total) Mean on Horizontal surfaces = Mean on Vertical Surfaces = Pooled standard deviation = X h X v T SE

28 One vs two tailed hypotheses- 1. Which is more powerful? H A1 : Urchins will be more dense on vertical than on horizontal surfaces H A : There will be a difference in the density of urchins on vertical vs horizontal surface 100% CI 100% CI 5% 95% CI 2.5% 95% CI 2.5% T= -1.79, p=0.04 T= -1.79, p=0.08 T = Horizontal Density Vertical Density of Urchins SE One vs two tailed hypotheses -Conversion to original units H A1 : Urchins will be more dense on vertical than on horizontal surfaces H A : There will be a difference in the density of urchins on vertical vs horizontal surface 100% CI 100% CI 5% 95% CI 2.5% 95% CI 2.5% Difference = , p=0.04 Difference = , p=0.08 Horizontal Density Vertical Density of Urchins

29 This is the difference between 1 and 2 tailed hypotheses make sure you know which you are dealing with Always strive for one tailed hypotheses Is there a directional prediction (eg > or separately <) One tailed If not Two tailed Assumptions of t test The t test is a parametric test The t statistic only follows t distribution if: variable has normal distribution (normality assumption) two groups have equal population variances (homogeneity of variance assumption) observations are independent or specifically paired (independence assumption)

30 Normality assumption Data in each group are normally distributed Checks: Frequency distributions be careful Boxplots Probability plots formal tests for normality Solutions: Transformations Don t worry run it anyway just kidding but not entirely Homogeneity of variance Population variances equal in 2 groups Checks: subjective comparison of sample variances boxplots F-ratio test of H 0 : 12 = 2 2 Solutions Transformations Don t worry run it anyway just kidding again but again not entirely

31 F-test on variances H 0 : 12 = 2 2 F statistic (F-ratio) = ratio of 2 sample variances F = s 12 / s 2 2 Reject H 0 if F < or > 1 If H 0 is true, F-ratio follows F distribution Usual logic of statistical test Boxplot Median 25% of values 25% of values Smallest value Largest value LENGTH

32 Count Limpet numbers per quadrat 1. IDEAL 2. SKEWED 3. OUTLIERS 4. UNEQUAL VARIANCES * * * * *

33 Use of transformations to control departures from normality and homogeneity of variances assumptions Pop_ Pop_1990 Variance Pop_1990 Lpop1990 Europe Islamic Newworld Greatest ratio POP_ LPOP Europe Islamic GROUP NewWorld -1 Europe Islamic GROUP NewWorld Ourworld Nonparametric tests Usually based on ranks of the data H 0 : samples come from populations with identical distributions equal means or medians Don t assume particular underlying distribution of data normal distributions not necessary Equal variances and independence still required Typically much less powerful than parametric tests

34 Mann-Whitney-Wilcoxon test Calculates sum of ranks in 2 samples should be similar if H 0 is true Compares rank sum to sampling distribution of rank sums distribution of rank sums when H 0 true Equivalent to t test on data transformed to ranks Additional slides

35 A brief digression to re-sampling theory Number inside Number outside Mean Traditional evaluation would probably involve a t test: another approach is re-sampling. Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 10 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment

36 Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 10 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 10 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment 3) Calculate mean for each group 7.6

37 Resampling Treatment Number Inside 3 Inside 5 Inside 2 Inside 8 Inside 7 Outside 10 Outside 7 Outside 9 Outside 12 Outside 8 1) Assume both treatments come from the same distribution 2) Resample groups of 5 observations, with replacement, but irrespective of treatment 3) Calculate mean for each group 4) Repeat many times 5) Calculate differences between pairs of means (remember the null hypothesis is that there is no effect of treatment). This generates a distribution of differences. Mean 1 Mean 2 Difference Number of Observations Distribution of differences observations Difference in Means OK, now what? 0.2 Proportion 0.1 per Bar 0.0

38 Compare distribution of differences to real difference Number inside Number outside Mean Real difference = 4.2 Estimate likelihood that real difference comes from two similar distributions Proportion of differences less than Mean 1 Mean 2 Difference current And on through 1000 differences Likelihood is that distributions are the same What are constraints of this sort of approach?

39 T-test vs resampling Test P-value Resampling T-test Why the difference? Additional examples

eggs per capsule Ward & Quinn (1988), qk2002 Box 3.1 Specify H 0 and choose test statistic: H 0 : M = L, i.e. population

40 Worked example Fecundity of predatory gastropods: sample of 37 and 42 egg capsule of Lepsiella from littorinid zone and mussel zone respectively Counted number of eggs per capsule Null hypothesis: no difference between zones in mean number of eggs per capsule Ward & Quinn (1988), qk2002 Box 3.1 Specify H 0 and choose test statistic: H 0 : M = L, i.e. population mean number of eggs per capsule from both zones are equal The t statistic is appropriate test statistic for comparing 2 population means

41 Specify a priori significance (probability) level (): By convention, use = 0.05 (5%). Collect data, check assumptions, calculate test statistic from sample data: Mean SD n Littorinid: Mussel: t = -5.39, df = 77

42 Compare value of t statistic to its sampling distribution, the probability distribution of statistic (for specific df) when H 0 is true what is probability of obtaining t value of 5.39 or greater from a t distribution with 77 df? what is probability of taking samples with observed or greater mean difference from 2 populations with same means? Probability (from JMP) P = Look up in t table P < 0.05

43 If probability of obtaining this value or larger is less than, conclude H 0 is unlikely to be true and reject it: statistically significant result Our probability (<0.001) is less than 0.05 so reject H 0 : statistically significant result. If probability of obtaining this value or larger is greater than, conclude that H 0 is likely to be true and do not reject it: statistically non-significant result

44 Presenting results of t test Methods: An independent t test was used to compare the mean number of eggs per capsule from the two zones. Assumptions were checked with. Results: The mean number of eggs per capsule from the mussel zone was significantly greater than that from the littorinid zone (t = 5.39, df = 77, P < 0.001; see Fig. 2).

Analysis of variance (ANOVA) Comparing the means of more than two groups

Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments