We need to define some concepts that are used in experiments.

Size: px

Start display at page:

Download "We need to define some concepts that are used in experiments."

Kerry Nash
5 years ago
Views:

1 Chapter 0 Analysis of Variance (a.k.a. Designing and Analysing Experiments) Section 0. Introduction In Chapter we mentioned some different ways in which we could get data: Surveys, Observational Studies, Experiments and Published Data. We have up to now analysed datasets in many ways, but we haven't studied how to collect the data. In this chapter we will look at the design of experiments and then we will see some new techniques for examining data collected in experiments. In an experiment the experimenter usually is interested in measuring the response of an individual or object to something applied to that individual or object. For example a pharmaceutical company may want to see the effect a drug has on patients with a particular disease. We need to define some concepts that are used in experiments.

2 Section 0. Definitions The Response Variable is the variable being measured in the experiment. Factors are variables applied to the object or individual in the experiment. We are interested in m asuring the effect of the Factors on the Response Variable. Quantitative Factors are measured on a numerical scale whereas Qualitative Factors are not measured on a numerical scale. Factor-Levels are the values of the factor used in the experiment. Treatments are the Factor-Level combinations used. An Experimental Unit is the object on which the response and factors are measured. A Designed Experiment is one for which the researcher controls the specifications of the treatments and the method of assigning the experimental units to each treatment. An Observational Experiment (Study) is one in which the researcher simply observes the treatments and the response on a sample of experimental units.

3 Section 0. Single Factor Anovas 0.. Introduction Suppose we have designed and conducted our experiment and it involved applying p different treatments to p different samples chosen in some manner. We are now interested in determining whether the responses of the experimental units differs according to treatment they received. As is usual in statistics we are interested in making an inference about a population using sample data. Here we have p populations each one represents the entire population who have in the past or will in the future receive each treatment. So here we have the following notation: P = Number of treatments being compared Population or Treatment 3 p Population or Treatment Mean μ μ μ 3 μ p Population or Treatment Variance σ σ σ 3 σ p Sample Size n n n 3 n p Sample Mean x x x 3 x p Sample Variance s s s 3 s p Also x is the mean of all the measurements

4 0.. So what are we at here? What are we trying to test here anyway? Well we're interested in seeing if there is any difference in the effect of each treatment on the experimental units. That means we are interested in testing if there is a difference between μ, μ, μ 3 etc. So in the single factor ANOVA we test the following hypotheses: H 0 : μ = μ = μ 3 =. = μ p H A : At least two of the μ's are different. Before we can do anything we must make an assumption to simplify life ASSUMPTION: We assume that σ = σ = σ 3 =.. = σ p and since they're all the same we just use the symbol σ. We also assume that we are dealing with a Completely Randomised Design Definition The Completely Randomised Design is an experimental design in which independent random samples of experimental units are selected for each treatment.

5 0..5 Now how will this test work? We will calculate sample means corresponding to each of the population means. We will check whether the difference between these sample means is small enough so that it would be explained by natural variability in the data. If the difference is instead larger than could be explained by natural variability then we will conclude that the population means actually are different. We could measure how far apart the individual sample means are by calculating SST the Sum of Squares for Treatments Definition: SST = [ n ( x - x ) + n ( x - x ) +. + n p ( x p - x ) ] A better measure however is MST - The Mean Square for Treatments. Definition: MST = SST/(p-) If all the individual x 's were the same and so also equal to the total x, then MST would be zero. The further the individual x 's are from each other then the bigger MST will be. If it is really big we will reject the null hypothesis and so conclude that at least two of the population means are different.

6 But we are left with the usual question, how big is big enough to reject? To begin to answer that question we first must notice that when H 0 is true the mean value of MST is σ. But when H 0 is false the mean of MST is not σ but instead is larger than σ. There is another statistic which estimates σ accurately and unbiased whether or not H 0 is true or not, it is called the Mean Square for Error, MSE. Definition: MSE = SSE/(N-p) where SSE = (n -) s + (n -) s +(n 3 -) s (n p -) s p and N is the total number of observations in the dataset.

7 0..6 The Test So MST should in general be bigger than MSE if H 0 is false. The technique we will use therefore is to compute the ratio MST/MSE and if it is big enough we will reject H 0. This ratio will be our test statistic. So far when we have encountered test statistics they have been Z's or T's, this ratio is neither, it is from a new distribution the F. Definition: The test statistic for a single-factor ANOVA is MST F = MSE with p- numerator degrees of freedom and N-p denominator degrees of freedom. These tests are all upper tailed, we want to reject when the ratio is really big. So we look up F tables and reject H 0 if F calc is bigger than some critical value of F. All of what we have seen in this chapter so far can be summarised in the following table:

8 Single Factor Anova Table - Completely Randomised Design Source df SS MS F Treatments p- SST MST= SST/(p-) MST/MSE Error N-p SSE MSE= SSE/(N-p) Total N- SStotal 0..7 Anova Computations We have seen formulae for SSE and SST however in practice these formulae are very cumbersome for performing calculations. The following formulae are easier to use: SST = [n x + n x +. + n p x p ] - N x SSTotal = N x Nx SSE = SSTotal - SST

9 0..8 Example Detergent makers always claim that their product washes whiter. We are going to test whether there is any difference between three detergents, BOLD, DAZ and PERSIL. 9 students are asked to wear white shirts and to go out for a night's drinking and whatever. The next morning we randomly assign 3 of the 9 white shirts to be washed using BOLD, 3 using DAZ and 3 using PERSIL. After the wash is over we bring the 9 shirts over to a Biology lab and examine them under a microscope for stains. The surface area that remains dirty is determined and the following results are obtained: BOLD:,, 3 DAZ: 3, 4, 4 PERSIL:, 3, Test whether there is a difference between the population means use significance level ANSWER: Analysis of Variance Source DF SS MS F P Treatment Error Total

10 Section 0.3 Randomised Block Design Sometimes randomisation can be improved upon. In the other half of this course you have seen paired T-tests. This idea of pairing can be extended to ANOVAs. Suppose we are conducting an experiment to measure the performance of a certain drug. The effect of this drug may be different on people depending on their ages, sex, blood pressure, weight etc. It is possible that the randomisation procedure does not evenly spread these characteristics among the different treatments. Sometimes it is better to force the people receiving each treatment to be the same. For example in this drug trial, we can pick groups of people with similar characteristics to receive the different treatments that way any difference we observe will be because of the treatments not because of different characteristics of the people Definition: A Randomised Block Design is an experimental procedure consisting of two steps:. The experimental units are divided into b blocks, the units chosen for a Block will be as similar as possible. There are p units in each Block where p is the number of Treatments being compared.. One experimental unit from each Block is randomly assigned to each treatment so in total there will be n = bp responses.

11 0.3. Assumptions. The observations corresponding to all blocktreatment combinations are Normally distributed.. The variances of all the Normal distributions are the same, σ. But the mean of all the Normal distributions may depend on the treatment applied and also on the block Notation and Formulae p = Number of Treatments b = Number of Blocks The average of all observations on the i th treatment is x Ti The average of all observations on the i th block is x Bi SST = b( x x) p = b x Nx p T i T i

12 b SSB = p( x x) b = p x Nx B i SSTotal = ( x x) N = x N Nx And SSE = SSTotal -SST - SSB The Test B i The purpose of the Randomised Block Design is the same as the Completely Randomised Design ie to test: H 0 : μ = μ = μ 3 =. = μ p H A : At least two of the μ's are different.

13 The procedure for the test is to complete an ANOVA table in the following format: Source df SS MS F Treatments p- SST MST= SST/(p-) MST/MSE Blocks b- SSB MSB= MSB/MSE SSB/(b-) Error N-p-b+ SSE MSE= SSE/(N-p) Total N- SStotal The last column in this table includes two F statistics, the first one F=MST/MSE is the statistic used to test H 0 : The treatment means are all the same vs H A : At least two of the treatment means differ. The second test statistic F=MSB/MSE tests whether there is a difference in the Block means. Rejection of this test indicates that the Block means differ and that the approach of using a Randomised Block Design instead of a Completely Randomised Design was a good choice.

14 0.3.5 Example A single factor ANOVA was conducted using a Randomised Block Design. It yielded the following results: Block Treatments Do the data provide sufficient evidence to suggest that the treatment means differ? Do the data provide sufficient evidence top indicate that blocking was effective in reducing the experimental error? Source df SS MS F Treatments Blocks Error Total

15 Section 0.4 Multiple Comparisons of Means 0.4. Introduction In the previous section we have seen how to determine if a set of n means are equal or if there is a difference between at least two of the means. In rejecting the F test of the single factor Anova all that we determined was that there was a difference between at least two of the means. We did not identify which of the means were different and which the same. In an experiment with 3 treatments there are three pairs of means which may be different (μ - μ ), (μ - μ 3 ) & (μ - μ 3 ). Having established that one of these pairs differs we now must test each pair seperately. In an experiment with P treatments there will be p ie: "p choose " possible combinations of pairs of means to be tested. Each pair of means will be tested essentially using a T-test. However there are two minor differences to the way we will perform these tests. Firstly we will actually use compute Confidence Intervals instead of performing Hypothesis Tests. Secondly we must use a different set of T-tables called Bonferroni -T tables because we are carrying out multiple tests.

0.4. Why Confidence Intervals are just like Hypothesis Tests Suppose we compute a 90% Confidence Interval the probability that the interval contains the actual value of the Population Characteristic

16 0.4. Why Confidence Intervals are just like Hypothesis Tests Suppose we compute a 90% Confidence Interval the probability that the interval contains the actual value of the Population Characteristic is 0.9. That means the probability that the interval does not contain the actual value of the test statistic is 0.. The region outside the 90% Confidence Interval is therefore just like the Rejection Region of a Two Tailed Hypothesis Test with α = 0.. This is evident from the picture. So if we find that a 90% Confidence Interval for μ - μ does not contain the value 0 that is equivalent to rejecting the Null Hypothesis of the following Two Tailed Test with significance level α = 0.: H 0 : μ - μ = 0 Vs H A : μ - μ 0

17 0.4.3 Multiple Tests and Type errors When a Hypothesis test is performed at significance level α = 0., that means that there is a 0. probability of rejecting the Null Hypothesis when in fact it is true. When we perform two tests each with α = 0., the probability of making a Type error on the first test or on the second test is now larger than 0.. Remember P(A or B) = P(A) + P(B) - P(A and B) In an experiment with p treatments we will be performing p(p-)/ tests and the combined probability of making a Type error will be large. For this reason certain new versions of the T-Tables were invented called Bonferroni T-Tables. These tables take into account that you may want to perform several linked tests and they give critical values to be used in each test so that the OVERALL probability of a Type error for the Experiment as a whole is below a certain threshold.

18 0.4.4 The Bonferroni Multiple Comparisons Procedure When comparing p treatments or populations, we decide whether two population means differ by. Computing the p(p-)/ confidence intervals below. Checking if any intervals do not contain Intervals which do not contain 0 indicate a significant difference in the population means (μ's). μ μ :. x x ± ( Bonferroni Tcritical) MSE n + MSE n.. μ p μ p : x p x p ± ( Bonferroni Tcritical) MSE n p + MSE n p The Bonferroni T critical values are computed using the Error degrees of freedom.

ANOVA: Comparing More Than Two Means

1 ANOVA: Comparing More Than Two Means 10.1 ANOVA: The Completely Randomized Design Elements of a Designed Experiment Before we begin any calculations, we need to discuss some terminology. To make this