Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Similar documents
CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

One-Way Analysis of Variance (ANOVA)

One-way ANOVA. Experimental Design. One-way ANOVA

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Statistics For Economics & Business

16.3 One-Way ANOVA: The Procedure

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Introduction to Business Statistics QM 220 Chapter 12

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Analysis of Variance

10/31/2012. One-Way ANOVA F-test

Unit 27 One-Way Analysis of Variance

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Chap The McGraw-Hill Companies, Inc. All rights reserved.

STAT Chapter 10: Analysis of Variance

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

The Multiple Regression Model

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Chapter 10: Analysis of variance (ANOVA)

Chapter 14 Simple Linear Regression (A)

PLSC PRACTICE TEST ONE

Chapter 3 Multiple Regression Complete Example

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

44.2. Two-Way Analysis of Variance. Introduction. Prerequisites. Learning Outcomes

Chapter Seven: Multi-Sample Methods 1/52

Two-Way Factorial Designs

What If There Are More Than. Two Factor Levels?

4.1. Introduction: Comparing Means

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

ANOVA - analysis of variance - used to compare the means of several populations.

Analysis of Variance

Correlation Analysis

Inference for Regression Simple Linear Regression

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Chapter 9 Inferences from Two Samples

Chapter 15: Analysis of Variance

ANOVA: Comparing More Than Two Means

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Lecture 7: Hypothesis Testing and ANOVA

Analysis of Variance (ANOVA)

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

We need to define some concepts that are used in experiments.

Analysis of Variance

Lecture 18: Analysis of variance: ANOVA

1 Introduction to One-way ANOVA

SIMPLE REGRESSION ANALYSIS. Business Statistics

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Battery Life. Factory

Hypothesis Testing hypothesis testing approach

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Analysis of Variance: Part 1

Sociology 6Z03 Review II

Two-Way Analysis of Variance - no interaction

What is Experimental Design?

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

ANOVA: Comparing More Than Two Means

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

STAT22200 Spring 2014 Chapter 8A

8/23/2018. One-Way ANOVA F-test. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions

Week 12 Hypothesis Testing, Part II Comparing Two Populations

21.0 Two-Factor Designs

n i n T Note: You can use the fact that t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) =

Analysis of Variance (ANOVA)

In ANOVA the response variable is numerical and the explanatory variables are categorical.

ANOVA: Analysis of Variation

Sampling Distributions: Central Limit Theorem

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

ANOVA CIVL 7012/8012

Performance Evaluation and Comparison

STATS Analysis of variance: ANOVA

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Inferences for Regression

QUEEN MARY, UNIVERSITY OF LONDON

While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

Analysis of variance

Biostatistics 270 Kruskal-Wallis Test 1. Kruskal-Wallis Test

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Design of Experiments. Factorial experiments require a lot of resources

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

ANOVA (Analysis of Variance) output RLS 11/20/2016

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Analysis of Variance (ANOVA) one way

Chapter 11 - Lecture 1 Single Factor ANOVA

Unit 12: Analysis of Single Factor Experiments

Data Analysis and Statistical Methods Statistics 651

Transcription:

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G 1

ANOVA Analysis of variance compares two or more population means of interval data. Specifically, we are interested in determining whether differences exist between the population means. The procedure works by analyzing the sample variances.

The assumptions underlying the analysis of variance technique are the same as those used in the t test when comparing two different means. We assume that the samples are randomly and independently drawn from Normally distributed populations which have equal variances. We deal with variable within the interval scale or ratio scale

To formalise this we break down the total variance of all the observations into 1. the variance due to differences between treatments or factors, and. the variance due to differences within treatments (also known as the error variance).

we have to work with three sums of squares: The total sum of squares measures (squared) deviations from the overall or grand average using all the observations. It ignores the existence of the different factors. The between sum of squares is based upon the averages for each factor and measures how they deviate from the grand average. The within sum of squares is based on squared deviations of observations from their own factor mean.

Total sum of squares=between Sum of Squares + Within Sum of Squares The larger - the between sum of squares relative to the within sum of squares, the more likely it is that the null is false.

One Way Analysis of Variance Example An apple juice manufacturer is planning to develop a new product -a liquid concentrate. The marketing manager has to decide how to market the new product. Three strategies are considered Emphasize the convenience of using the product. Emphasize the quality of the product. Emphasize the product s low price.

One Way Analysis of Variance Example: An experiment was conducted as follows: In three cities an advertisement campaign was launched. In each city only one of the three characteristics (convenience, quality, and price) was emphasized. The weekly sales were recorded for twenty weeks following the beginning of the campaigns.

Problem assumptions The data are interval The problem objective is to compare sales in the three cities. We hypothesize that the three population means are equal

One Way Analysis of Variance Weekly sales Convenience 59 Quality 804 Price 67 658 630 531 793 514 774 717 443 596 663 679 60 719 711 604 60 50 659 606 697 689 461 706 675 59 498 615 49 51 691 663 719 733 604 495 787 699 698 776 485 57 561 557 53 57 353 584 469 557 634 581 54 580 679 614 64 53

Defining the Hypotheses Solution H 0 : m 1 = m = m 3 H 1 : At least two means differ To build the statistic needed to test the hypotheses we use the following notation:

Notation Independent samples are drawn from k populations (treatments). First observation, first sample Second observation, second sample Sample size Sample mean 1 k X 11 x 1.. X n1,1 n1 x1 X 1 x.. X n, n x X 1k x k.. X nk,k X is the response variable. The variables value are called responses. n x k k

Terminology In the context of this problem Response variable weekly sales Responses actual sale values Experimental unit weeks in the three cities when we record sales figures. Factor the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy. Factor levels the population (treatment) names. In this problem factor levels are the marketing strategies.

The rationale of the test statistic Two types of variability are employed when testing for the equality of the population means

30 5 0 x 15 x 3 0 0 19 x 3 0 16 15 14 11 10 9 x 1 10 1 10 9 x 1 10 x 15 7 A small variability within the samples makes it easier Treatment 1 Treatment Treatment 3 to draw a conclusion about the population means. The 1 sample means are the same as before, but the larger within-sample Treatment 1 Treatment Treatment variability 3 makes it harder to draw a conclusion about the population means.

The rationale behind the test statistic Part I If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean). If the alternative hypothesis is true, at least some of the sample means would differ. Thus, we measure variability between sample means.

Variability between sample means The variability between the sample means is measured as the sum of squared distances between each treatment mean and the grand mean. This sum is called the Sum of Squares for Treatments-SST or Between Sum of Squares BSS In our example treatments are represented by the different advertising strategies.

NOTE: Here SST Total Sum of Squares TSS = BSS It is the Between Sum of Squares

Sum of squares for treatments (SST) or Between Sum of Squares BSS BSS or SST k j1 n (x x) j j There are k treatments The size of sample j Note: When the sample means are close to one another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H 1. The mean of sample j or Factor j or treatment j

Sum of squares for treatments (SST) or BSS Solution continued Calculate SST or BSS x 1 577.55 x 653.00 x 3 608.65 X The grand mean is calculated by n x 1 1 n 1 n n x...... n n k k x SST k k j1 n j (x j x) = 0(577.55-613.07 ) + 0(653.00-613.07) + 0(608.65-613.07) = 57,51.3

The rationale behind test statistic Part II Large variability within the samples weakens the ability of the sample means to represent their corresponding population means. Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the within samples variability.

Within samples variability SSE or WSS (Within Sum of Squares) or ESS The variability within samples is measured by adding all the squared distances between observations and their sample means. This sum is called the Sum of Squares for Error SSE or WSS In our example this is the sum of all squared differences between sales in city j and the sample mean of city j (over all the three cities).

For example: SSE or WSS (n 1-1)s 1 + (n -1)s + (n 3-1)s 3 + + (n k 1)s k = k j=1 n j 1 s j k = no. of treatments SSE k j1 n j i1 ( x ij x j )

k j=1 n j 1 s j = SSE or WSS where x is the column j mean j SSE k j1 n j i1 ( x ij x j )

Sum of squares for errors (SSE) Solution Continued: Calculate SSE s 1 10,775.00 s 7,38,11 s 3 8,670.4 SSE k j1 n j i1 ( x ij x j ) Or, SSE (n 1-1)s 1 + (n -1)s + (n 3-1)s 3 = (0-1)10,774.44 + (0-1)7,38.61+ (0-1)8,670.4 = 506,983.50

The mean sum of squares To perform the test we need to calculate the mean squares as follows: Calculation of MST - Mean Square for Treatments Calculation of MSE Mean Square for Error MST k SST MSE SSE 1 n k 57,51.3 3 1 509,983.50 60 3 8,756.1 8,894.45

Calculation of the test statistic F MST MSE 8,756.1 8,894.45 Required Conditions: 1. The populations tested are normally distributed.. The variances of all the populations tested are equal. 3.3 with the following degrees of freedom: v 1 =k -1 and v =n-k

The F test rejection region And finally the Decision Rule H 0 : m 1 = m = =m k H 1 : At least two means differ Test statistic: Reject H 0 if: F>F a,k-1,n-k F MST MSE

The F test H o : m 1 = m = m 3 H 1 : At least two means differ Test statistic F= MST/ MSE= 3.3 R.R. : F Fa k1 nk F0.05,31,603 MST F MSE 8,756.1 8,894.17 3.3 3.15 Since 3.3 > 3.15, there is sufficient evidence to reject H o in favor of H 1, and argue that at least one of the mean sales is different than the others.

ANOVA Anova: Single Factor SUMMARY Groups Count Sum Average Variance Convenience 0 11551 577.55 10775.00 Quality 0 13060 653.00 738.11 Price 0 1173 608.65 8670.4 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 5751 8756 3.3 0.0468 3.16 Within Groups 506984 57 8894 Total(TSS) 564496 59

k n 1 s = SSE j=1 j j BSS or SST k j1 n (x x) j j

Question The reaction times of three groups of sportsmen were measured on a particular task, with the following results (time in milliseconds): Racing drivers 31 8 39 4 36 30 Tennis players 41 35 41 48 44 39 38 Boxers 44 47 35 38 51 Test whether there is a difference in reaction times between the three groups.

Introduction ANOVA is the technique where the total variance present in the data set is spilt up into non- negative components where each component is due to one factor or cause of variation. Factors of variation Assignable Can be many Non-assignable Error or Random variation

Utility ANOVA is used to test hypotheses about differences between two or more means. The t-test can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using t-tests. However, conducting multiple t-tests can lead to severe inflation of the Type I error type. ANOVA is used to test differences among several means for significance without increasing the Type I error rate using an F test

The ANOVA Procedure: This is the ten step procedure for analysis of variance: 1.Description of data.assumption: Along with the assumptions, we represent the model for each design we discuss. 3. Hypothesis 4.Test statistic 5.Distribution of test statistic 6.Decision rule

7.Calculation of test statistic: The results of the arithmetic calculations will be summarized in a table called the analysis of variance (ANOVA) table. The entries in the table make it easy to evaluate the results of the analysis. 8.Statistical decision 9.Conclusion 10.Determination of p value

ONE-WAY ANOVA- Completely Randomized Design (CRD) One-way ANOVA: It is the simplest type of ANOVA, in which only one source of variation, or factor, is investigated. It is an extension to three or more samples of the t test procedure for use with two independent samples In another way t test for use with two independent samples is a special case of oneway analysis of variance.

Experimental design used for one-way ANOVA is called Completely randomised design. This tests the effect of equality of several treatments of one assignable cause of variation. Based on two principles- Replication and randomization. Advantages: Very simple: Reduces the experimental error to a great extent. We can reduce or increase some treatments. Suitable for laboratory experiments. Disadvantages: Design is not suitable if the experimental units are not homogeneous. Design is not so much efficient and sensitive as compared to others. Local control is completely neglected.

Hypothesis Testing Steps: 1. Description of data: The measurements( or observation) resulting from a completely randomized experimental design, along with the means and totals. Available Subjects 0 1 0 03 04 05 06 07 08 0 9 10 11 1 13 1 15 4 16 Random numbers 1 6 0 9 06 15 14 11 0 04 1 0 07 05 13 03 1 01 08 16 09 06 15 14 11 0 04 10 07 05 13 03 1 01 08

Table of Sample Values for the CRD Treatment 1 3 K x 11 x 1 x 13 x 1k x 1 x x 3. X k.... x x n1 x 1 n n3 3 x nkk Total T.1 T. T.3 T.k T.. Mean x.1 x. x.3 x.k x..

Table of Sample Values for the Randomized Complete Block Design Treatments Blocks 1 3 k Total Mean 1 x 11 x 1 x 13... x 1k T 1. x 1. x 1 x x 3 x k T. x.... n x n1 x n x n3. x nk Tn. X n. Total T.1 T. T.3 T.k T.. Mean x.1 x. x.3 x.k x..

x ij = the i th observation resulting from the j th treatment (there are a total of k treatment) T.j = x ij = total of the j th treatment x.j = T.j/nj = mean of jth treatment T.. = T.j = x ij = total of all observations x.. = T../N, N = n j

. Assumption: The Model The one-way analysis of variance may be written as follows: x ij = m j e ij ; i=1, n j, j= 1,.k The terms in this model are defined as follows: 1. m represents the mean of all the k population means and is called the grand mean.. j represents the difference between the mean of the j th population and the grand mean and is called the treatment effect. 3. e ij represents the amount by which an individual measurement differs from the mean of the population to which it belongs and is called the error term.

Assumptions of the Model The k sets of observed data constitute k independent random samples from the respective populations. Each of the populations from which the samples come is normally distributed with mean m j and variance j. Each of the populations has the same variance. That is 1 = = k =, the common variance. The j are unknown constants and j = 0, since the sum of all deviations of the m j from their mean, m, is zero. The (errors) e ij have a mean of 0, since the mean of x ij is m j The e ij have a variance equal to the variance of the x ij, since the e ij and x ij differ only by a constant. The e ij are normally (and independently) distributed.

3. Hypothesis: We test the null hypothesis that all population or treatment means are equal against the alternative that the members of at least one pair are not equal. We may state the hypothesis as follows H 0 : µ 1 = µ =..= µ k H A : not all µ j are equal If the population means are equal, each treatment effect is equal to zero, so that alternatively, the hypothesis may be stated as H 0 : τ j = 0, j=1,,.,k H A : not all τ j =0

4. Test statistic: Table: Analysis of Variance Table for the Completely Randomized Design Source of variation Among sample Within samples Sum of square d.f Mean square Variance ratio SSA k n j1 ). j.. k-1 MSA=SSA/(k-1) N-k Total N-1 k n j SST ( x ij x..) j1 i1 j ( k n j SSW j1 i1 x ( x ij x x. j) MS due to Treatment MSW=SSW/(N-k) MS due to error V.R=MSA/ MSW=F The Total Sum of squares(tss): It is the sum of the squares of the deviations of individual observations taken together.

The Within Groups of Sum of Squares: The first step in the computation call for performing some calculations within each group. These calculation involve computing within each group the sum of squared deviations of the individual observations from their mean. When these calculations have been performed within each group, we obtain the sum of the individual group results. The Among Groups Sum of Squares: To obtain the second component of the total sum of square, we compute for each group the squared deviation of the group mean from the grand mean and multiply the result by the size of the group. Finally we add these results over all groups. Total sum of square is equal to the sum of the among and the within sum of square. TSS=SSA+SSW

The First Estimate of σ : Within any sample n j j1 ( x n ij j 1. j) Provides an unbiased estimate of the true variance of the population from which the sample came. Under the assumption that the population variances are all equal, we may pool the k estimate to obtain k n j j1 i 1 k j1 ( x ( ij n j x x 1). j)

The Second Estimate of σ : The second estimate of σ may be obtain from the familiar formula for the variance of sample means,. If we solve this equation for σ x n, the variance of the population from which the samples were drawn, we have An unbiased estimate of provided by k j1 ( x k x 1 n x, computed from sample data, is. jx.. ) If we substitute this quantity into equation we obtain the desired estimate of σ n k j1 ( x k. jx.. ) 1

When the sample sizes are not all equal, an estimate of σ based on the variability among sample means is provided by The Variance Ratio: k j1 n j ( x k. jx.. ) 1 What we need to do now is to compare these two estimates of σ, and we do this by computing the following variance ratio, which is the desired test statistic: V.R = Among groups mean square Within groups mean square

6. Distribution of Test statistic: F distribution we use in a given situation depends on the number of degrees of freedom associated with the sample variance in the numerator and the number of degrees of freedom associated with the sample variance in the denominator. we compute V.R. in situations of this type by placing the among groups mean square in the numerator and the within groups mean square in the denominator, so that the numerator degrees of freedom is equal to the number of groups minus 1, (k-1), and the denominator degrees of freedom value is equal to k ( n 1) n j j1 j1 k j k N k

7. Significance Level: Once the appropriate F distribution has been determined, the size of the observed V.R. that will cause rejection of the hypothesis of equal population variances depends on the significance level chosen. The significance level chosen determines the critical value of F, the value that separates the nonrejection region from the rejection region. 8. Statistical decision: To reach a decision we must compare our computed V.R. with the critical value of F, which we obtain by entering Table G with k-1 numerator degrees of freedom and N-k denominator degrees of freedom. If the computed V.R. is equal to or greater than the critical value of F, we reject the null hypothesis. If the computed value of V.R. is smaller than the critical value of F, we do not reject the null hypothesis.

9. Conclusion: When we reject H 0 we conclude that not all population means are equal. When we fail to reject H 0, we conclude that the population means may be equal. 10. Determination of p value