Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

Similar documents
Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Example - Alfalfa (11.6.1) Lecture 14 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Lecture 15 - ANOVA cont.

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Announcements. Statistics 104 (Mine C etinkaya-rundel) Diamonds. From last time.. = 22

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

Comparing Several Means

ANOVA: Comparing More Than Two Means

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Lecture 16 - Correlation and Regression

WELCOME! Lecture 13 Thommy Perlinger

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

ANOVA: Analysis of Variation

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Variance Decomposition and Goodness of Fit

Lecture 6 Multiple Linear Regression, cont.

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Two-Way Analysis of Variance - no interaction

Unit 27 One-Way Analysis of Variance

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Factorial designs. Experiments

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Chapter 11 - Lecture 1 Single Factor ANOVA

STA 101 Final Review

Analysis of Variance

Simple Linear Regression: One Qualitative IV

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Analysis of variance

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Lecture 5: ANOVA and Correlation

Ch. 11 Inference for Distributions of Categorical Data

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

compare time needed for lexical recognition in healthy adults, patients with Wernicke s aphasia, patients with Broca s aphasia

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Statistics For Economics & Business

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Chapter 4: Regression Models

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Chapter 4. Regression Models. Learning Objectives

Allow the investigation of the effects of a number of variables on some response

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

STAT Chapter 10: Analysis of Variance

Introduction to Business Statistics QM 220 Chapter 12

Lecture 7: Hypothesis Testing and ANOVA

Announcements. Final Review: Units 1-7

Lab #12: Exam 3 Review Key

STAT 213 Interactions in Two-Way ANOVA

ANOVA CIVL 7012/8012

Sociology Research Statistics I Final Exam Answer Key December 15, 1993

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

STAT 525 Fall Final exam. Tuesday December 14, 2010

Analysis of Variance Inf. Stats. ANOVA ANalysis Of VAriance. Analysis of Variance Inf. Stats

Statistics and Quantitative Analysis U4320

Analysis of Variance Bios 662

Sociology 6Z03 Review II

Fractional Factorial Designs

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Inferences for Regression

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Chap The McGraw-Hill Companies, Inc. All rights reserved.

We need to define some concepts that are used in experiments.

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Hypothesis testing. Data to decisions

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

ANOVA: Analysis of Variance

Two or more categorical predictors. 2.1 Two fixed effects

HYPOTHESIS TESTING. Hypothesis Testing

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

3. Design Experiments and Variance Analysis

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

ANOVA Analysis of Variance

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Inference for Regression Inference about the Regression Model and Using the Regression Line

Lecture 11: Simple Linear Regression

Power Analysis. Introduction to Power

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Analysis of Variance (ANOVA)

1 Introduction to One-way ANOVA

RCB - Example. STA305 week 10 1

Multiple Regression Analysis

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

Inference for Regression

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Statistics for Managers Using Microsoft Excel

Transcription:

Aldrin in the Wolf River Wolf River Lecture 19 - Sta 111 Colin Rundel June 11, 2014 The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. These compounds are denser than water and their molecules tend to become stuck in sediment, and are more likely to be found in higher concentrations near the bottom than near mid-depth. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 2 / 64 Aldrin in the Wolf River Aldrin in the Wolf River Wolf River - Data Aldrin concentration (nanograms per liter) at three levels of depth. Exploratory analysis Aldrin concentration (nanograms per liter) at three levels of depth. aldrin depth 1 3.80 bottom 2 4.80 bottom... 10 8.80 bottom 11 3.20 middepth 12 3.80 middepth... 20 6.60 middepth 21 3.10 surface 22 3.60 surface... 30 5.20 surface bottom middepth surface 3 4 5 6 7 8 9 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.10 1.37 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 3 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 4 / 64

Aldrin in the Wolf River Aldrin in the Wolf River Research question Is there a difference between the mean aldrin concentrations among the three levels? To compare means of 2 groups we use a Z or a T statistic. To compare means of 3 or more groups we use a new test called (analysis of variance) and a new test statistic, F. is used to assess whether the mean of the outcome variable is different for different levels of a categorical variable. H 0 : The mean outcome is the same across all categories, µ 1 = µ 2 = = µ k, where µ i represents the mean of the outcome for observations in category i. H A : At least one pair of means differ. Note - this hypothesis test does not tell us if all the means are different or only if one pair is different, more on how to do that later. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 5 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 6 / 64 Aldrin in the Wolf River Checking conditions Conditions 1 The observations should be independent within and between groups If the data are a simple random sample from less than 10% of the population, this condition is satisfied. Carefully consider whether the data may be independent (e.g. no pairing). Always important, but sometimes difficult to check. 2 The observations within each group should be nearly normal. (1) Independence Does this condition appear to be satisfied for the Wolf River data? Particularly important when the sample sizes are small. How do we check for normality? 3 The variability across the groups should be about equal. Particularly important when the sample sizes differ between groups. How can we check this condition? Sta 111 (Colin Rundel) Lec 19 June 11, 2014 7 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 8 / 64

Checking conditions Checking conditions (2) Approximately normal Does this condition appear to be satisfied? (3) Constant variance Does this condition appear to be satisfied? 3 bottom 2 middepth 4 surface 9 Data 2 3 8 1 0 3 5 7 9 9 8 7 6 5 4 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1 0 3 5 7 6.5 6.0 5.5 5.0 4.5 4.0 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2 1 0 2.5 4.0 5.5 5.0 4.5 4.0 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 7 6 5 4 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 In this case it is somewhat hard to tell since the means are different. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 9 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 10 / 64 Checking conditions Comparison (3) Constant variance - Residuals z/t test vs. - Purpose One of the ways to think about each data point is as follows: y ij = µ i + ɛ ij where ɛ ij is called the residual (ɛ ij = y ij µ i ). 2 1 0 1 Residuals z/t test Compare means from two groups to see whether they are so far apart that the observed difference cannot reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 Compare the means from two or more groups to see whether they are so far apart that the observed differences cannot all reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 = = µ k 2 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 11 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 12 / 64

Comparison Comparison z/t test vs. - Method z/t test vs. z/t test Compute a test statistic (a ratio). z/t = ( x 1 x 2 ) (µ 1 µ 2 ) SE( x 1 x 2 ) Compute a test statistic (a ratio). F = variability btw. groups variability w/in groups With only two groups t-test and are equivalent, but only if we use a pooled standard variance in the denominator of the test statistic. With more than two groups, compares the sample means to an overall grand mean. Large test statistics lead to small p-values. If the p-value is small enough H 0 is rejected, and we conclude that the population means are not equal. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 13 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 14 / 64 and the F test and the F test Test statistic Does there appear to be a lot of variability within groups? How about between groups? F = variability btw. groups variability w/in groups Test statistic (cont.) F = variability btw. groups variability w/in groups = MSG MSE MSG is mean square between groups df G = k 1 where k is number of groups MSE is mean square error - variability in residuals df E = n k where n is number of observations. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 15 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 16 / 64

and the F test output, deconstructed F distribution and p-value F = variability btw. groups variability w/in groups output (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Degrees of freedom associated with groups: df G = k 1, where k is the number of groups total: df T = n 1, where n is the total sample size error: df E = df T df G In order to be able to reject H 0, we need a small p-value, which requires a large F statistic. In order to obtain a large F statistic, variability between sample means needs to be greater than variability within sample means. df G = k 1 = 3 1 = 2 df T = n 1 = 30 1 = 29 df E = 29 2 = 27 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 17 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 18 / 64 output, deconstructed output, deconstructed output (cont.) (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Sum of squares between groups, SSG Measures the variability between groups SSG = k n i ( x i x) 2 i=1 where n i is each group size, x i is the average for each group, x is the overall (grand) mean. n mean bottom 10 6.04 middepth 10 5.05 surface 10 4.2 overall 30 5.1 SSG = ( 10 (6.04 5.1) 2) + ( 10 (5.05 5.1) 2) + ( 10 (4.2 5.1) 2) =16.96 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 19 / 64 output (cont.) - SST (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Sum of squares total, SST Measures the variability between groups n SST = (x i x) 2 i=1 where x i represent each observation in the dataset. SST = (3.8 5.1) 2 + (4.8 5.1) 2 + (4.9 5.1) 2 + + (5.2 5.1) 2 = ( 1.3) 2 + ( 0.3) 2 + ( 0.2) 2 + + (0.1) 2 = 1.69 + 0.09 + 0.04 + + 0.01 = 54.29 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 20 / 64

output, deconstructed output, deconstructed output (cont.) - SSE output (cont.) - MS (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 (Group) depth 2 16.96 8.48 6.13 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Sum of squares error, SSE Measures the variability within groups: k n i SSE = SST SSG = (x ij x i ) 2 i=1 j=1 SSE = 54.29 16.96 = 37.33 Mean square Mean square is calculated as sum of squares divided by the degrees of freedom. MSG = 16.96/2 = 8.48 MSE = 37.33/27 = 1.38 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 21 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 22 / 64 output, deconstructed output, deconstructed output (cont.) - F (Group) depth 2 16.96 8.48 6.14 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 Test statistic, F value As we discussed before, the F statistic is the ratio of the between group and within group variability. F = MSG MSE output (cont.) - P-value P-value (Group) depth 2 16.96 8.48 6.14 0.0063 (Error) Residuals 27 37.33 1.38 Total 29 54.29 The probability of at least as large a ratio between the between group and within group variability, if in fact the means of all groups are equal. It s calculated as the area under the F curve, with degrees of freedom df G and df E, above the observed F statistic. F = 8.48 1.38 = 6.14 dfg = 2 ; dfe = 27 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 23 / 64 0 6.14 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 24 / 64

output, deconstructed Conclusion If p-value is small (less than α), reject H 0. The data provide convincing evidence that at least one mean is different from (but we can t tell which one). If p-value is large, fail to reject H 0. The data do not provide convincing evidence that at least one pair of means are different from each other, the observed differences in sample means are attributable to sampling variability (or chance). What is the conclusion of the hypothesis test for Wolf river? Which means differ? We ve concluded that at least one pair of means differ. The natural question that follows is which ones? We can do two sample t tests for differences in each possible pair of groups. Can you see any pitfalls with this approach? When we run too many tests, the Type 1 Error rate increases. This issue is resolved by using a modified significance level. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 25 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 26 / 64 Multiple comparisons The scenario of testing many pairs of groups is called multiple comparisons. If there are k groups, then there are K = ( ) k 2 = k(k 1) 2 possible pairs. One common approach is the Bonferroni correction that uses a more stringent significance level for each test: Determining the modified α In the aldrin data set depth has 3 levels: bottom, mid-depth, and surface. If α = 0.05, what should be the modified significance level or two sample t tests for determining which pairs of groups have significantly different means? α = α/k where K is the number of comparisons being considered. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 27 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 28 / 64

Which means differ? Based on the box plots below, which means would you expect to be significantly different? 9 (a) bottom & surface 8 7 6 5 4 (b) bottom & mid-depth (c) mid-depth & surface (d) bottom & mid-depth; mid-depth & surface Which means differ? (cont.) If the assumption of equal variability across groups is satisfied, we can use the data from all groups to estimate variability: Estimate any within-group standard deviation with MSE, which is s pooled Use the error degrees of freedom, n k, for t-distributions Difference in two means: after σ1 2 SE = + σ2 2 MSE n 1 n 2 n 1 + MSE n 2 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 (e) bottom & mid-depth; bottom & surface; mid-depth & surface Sta 111 (Colin Rundel) Lec 19 June 11, 2014 29 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 30 / 64 Is there a difference between the average aldrin concentration at the bottom and at mid depth? Is there a difference between the average aldrin concentration at the bottom and at surface? depth 2 16.96 8.48 6.13 0.0063 Residuals 27 37.33 1.38 Total 29 54.29 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.2 0.66 overall 30 5.1 1.37 T dfe = T 27 = ( x b x m ) MSE n b + MSE n m (6.04 5.05) 1.38 + 1.38 10 10 = 0.99 0.53 = 1.87 0.05 < p value < 0.10 (two-sided) α = 0.05/3 = 0.0167 Fail to reject H 0, the data do not provide convincing evidence of a difference between the average aldrin concentrations at bottom and mid depth. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 31 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 32 / 64

Work Hours and Education Level Previously we have seen data from the General Social Survey in order to compare the average number of hours worked per week by US residents with and without a college degree. However, this analysis didn t take advantage of the original data which contained more accurate information on educational attainment (less than high school, high school, junior college, Bachelor s, and graduate school). Work Hours and Education Level Educational attainment Less than HS HS Jr Coll Bachelor s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 Using, we can consider educational attainment levels for all 1,172 respondents at once instead of re-categorizing them into two groups. On the following slide are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis. Hours worked per week 80 60 40 20 0 Less than HS HS Jr Coll Bachelor's Graduate Sta 111 (Colin Rundel) Lec 19 June 11, 2014 33 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 34 / 64 Work Hours and Education Level ( table) Given what we know, fill in the unknowns in the table below. degree?????? 501.54??? 0.0682 Residuals??? 267,382??? Total?????? Educational attainment Less than HS HS Jr Coll Bachelor s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 Example - Alfalfa Researchers were interested in the effect that acid has on the growth rate of alfalfa plants. They created three treatment groups in an experiment: low acid, high acid, and control. The alfalfa plants were grown in a Styrofoam cups arranged near a window and the height of the alfalfa plants was measured after five days of growth. The experiment consisted of 5 cups for each of the 3 treatments, for a total of 15 observations. High Acid Low Acid Control 1.30 1.78 2.67 1.15 1.25 2.25 0.50 1.27 1.46 0.30 0.55 1.66 1.30 0.80 0.80 ȳ i 0.910 1.130 1.768 n 5 5 5 µ = 1.269 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 35 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 36 / 64

Alfalfa Hypotheses We would like to establish if the acid treatments are affecting the alfalfas growth. Since we have a numerical response and categorical explanatory variable we will use an. What should our hypotheses be? H 0 : µ H = µ L = µ C H A : At least one mean is different Treatment Effect Last time we mentioned that it is possible to write down a model for each data point y ij = µ i + ɛ ij where i {H, L, C} is the treatment and j {1, 2, 3, 4, 5} is the index of the observation within that treatment. We can rewrite this in terms of the grand mean µ as follows y ij = µ + τ i + ɛ ij where τ i = µ i µ is known as the treatment effect. Thinking in terms of the treatment effect we can rewrite our null hypothesis H 0 : µ H = µ L = µ C = µ H 0 : τ H = τ L = τ C = 0 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 37 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 38 / 64 Alfalfa Table - Sum Sq Alfalfa Table - DF SST = df Sum Sq Mean Sq F value Pr(>F) Treatment 1.986 Residuals 3.893 Total 5.879 k n i (y ij µ) 2 i=1 j=1 = (1.3 1.269) 2 + (1.15 1.269) 2 + + (0.80 1.269) 2 = 5.879 k SSG = n i (µ i µ) 2 i=1 = 5 (0.91 1.269) 2 + 5 (1.13 1.269) 2 + 5 (1.768 1.269) 2 = 1.986 SSE = SST SSG = 3.893 df Sum Sq Mean Sq F value Pr(>F) Treatment 2 1.986 Residuals 12 3.893 Total 14 5.879 df T = n 1 = 15 1 = 14 df G = k 1 = 3 1 = 2 df E = n k = 15 3 = 12 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 39 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 40 / 64

Blocking Alfalfa Table - Mean Sq, F, P-value df Sum Sq Mean Sq F value Pr(>F) Treatment 2 1.986 0.993 3.061 0.0843 Residuals 12 3.893 0.324 Total 14 5.879 MSG = SSG/df G = 1.986/2 = 0.993 MSE = SSE/df E = 3.907/12 = 0.324 F = MSG/MSE = 0.993/0.326 = 3.061 P-value = P(> F ) = 0.0843 Based on these results we fail to reject H 0, and there is not sufficient evidence to suggest that at least one of the mean growth values is significantly different (or that at least one of the treatment effects is not zero). Sta 111 (Colin Rundel) Lec 19 June 11, 2014 41 / 64 Random Sampling / Assignment Random sampling removes nuisance factors/variables (things that affect your outcome that you are not interested in). Imagine we are interested in exploring whether increasing the dosage of a Statin will reduce the risk of a heart attack. We randomly sample patients already on a Statin and randomly assign them to either maintain their current dosage or increase their dosage by 20%. Possible that some of the patients in this sample may have had a previous heart attack, Significant risk factor for a future heart attack Their presence may alter our outcome Control for this effect by excluding them However, random sampling / assignment ensure that in the long run these nuisance factors show up with equal frequency in all treatment levels and as such their effect(s) will cancel out. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 42 / 64 Blocking Blocking Why do we bother with controls then? Because they help reduce noise/uncertainty in the data. Types of Controls Exclusion Works if the number of patients with a previous heart attack is low Can only exclude so many nuisance factors Restricts generalizability Blocking Samples grouped into homogeneous blocks where the nuisance factor(s) are held constant Variation within the block should be less than the variation between blocks Previous heart attack block and a no previous heart attack block Randomized treatment assignment within each block Block what you can; randomize what you cannot. Blocking and Alfalfa In the description for the alfalfa acid rain experiment we are told that the Styrofoam cups are arranged next to a window. What are some potential nuisance factors that could have affected the experiment s outcome? Do any of them lend themselves to blocking? Sta 111 (Colin Rundel) Lec 19 June 11, 2014 43 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 44 / 64

We will only consider the simplest case of randomized block design where each block contains only one observation of each treatment. High Acid Low Acid Control Block Mean Block 1 1.30 1.78 2.67 1.917 Block 2 1.15 1.25 2.25 1.550 Block 3 0.50 1.27 1.46 1.077 Block 4 0.30 0.55 1.66 0.837 Block 5 1.30 0.80 0.80 0.967 Trmt mean 0.910 1.130 1.768 n 5 5 5 µ = 1.269 Block Data Model When employing blocks we can think of each data point as where y ijk = µ + τ i + β j + ɛ ijk τ i is the treatment effect for treatment i β j is the block effect of block j ɛ ijk is the residual of observation k in block j with treatment i this is very similar to the one-way anova model we saw previous with the addition of the β j s. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 45 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 46 / 64 Randomized Block Table With the introduction of the blocks there are now two hypotheses we would like to evaluate: H 0 (treatment) : τ H = τ L = τ C = 0 H 0 (block) : β 1 = β 2 = β 3 = β 4 = β 5 = 0 In order to test these hypotheses we will build on the table we have been using. Randomized Block Table df Sum Sq Mean Sq F value Group k 1 k i=1 n i(µ i µ) 2 SSG/df G MSG/MSE Block b 1 b j=1 m j(µ j µ) 2 SSB/df B MSB/MSE Error n k b + 1 SST SSG SSB SSE/df E Total n 1 i j k (y ijk µ) 2 df Sum Sq Mean Sq F value Pr(>F) Group df G SSG MSG F G Block df B SSB MSB F B Error df E SSE MSE Total df T SST n - # observations k - # groups b - # blocks n i - # observations in group i m j - # observations in block j µ i - group mean for group i µ j - block mean for group j Sta 111 (Colin Rundel) Lec 19 June 11, 2014 47 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 48 / 64

Randomized Block Table - Alfalfa We already know some of the values from our previous one-way, and it is easy to find the other df values. Sum of Squares Blocks SSB = b m j (µ j µ) 2 j=1 df Sum Sq Mean Sq F value Group 2 1.986 0.993 MSG/MSE Block 4 b j=1 m j(µ j µ) 2 SSB/df B MSB/MSE Error 8 SST SSG SSB SSE/df E Total 14 5.879 High Acid Low Acid Control Block Mean Block 1 1.30 1.78 2.67 1.917 Block 2 1.15 1.25 2.25 1.550 Block 3 0.50 1.27 1.46 1.077 Block 4 0.30 0.55 1.66 0.837 Block 5 1.30 0.80 0.80 0.967 Trmt mean 0.910 1.130 1.768 n 5 5 5 µ = 1.269 SSB = 3 (1.917 1.269) 2 + 3 (1.550 1.269) 2 + 3 (1.077 1.269) 2 + 3 (0.837 1.269) 2 + 3 (0.967 1.269) 2 =1.260 + 0.237 + 0.111 + 0.560 + 0.274 = 2.441 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 49 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 50 / 64 Completing the table Calculating P-values df Sum Sq Mean Sq F value Group 2 1.986 0.993 5.471 Block 4 2.441 0.6103 3.362 Error 8 1.452 0.1815 Total 14 5.879 The two F values that we have calculated can be used to evaluate the two hypotheses we started with. Treatment effect Block effect H 0 : τ H = τ L = τ G, H A : At least one treatment effect is not zero H 0 : β 1 = β 2 =... = β 5, H A : At least one block effect is not zero To calculate the P-value for each hypothesis we use F G and F B respectively to find P(> F ) for an F distribution with the appropriate degrees of freedom. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 51 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 52 / 64

Treatment Effect We have calculated that F G = 5.471, to find the P-value we need to the probability of observing a value equal to or larger than this from an F distribution with 2 and 8 degrees of freedom. Using R we find that pf(5.471, df1 = 2, df2 = 8, lower.tail = FALSE) ## [1] 0.03182 Therefore, P(> F G ) = 0.0318, which leads us to reject H 0 - there is sufficient evidence to suggest that at least one treatment effect is not 0. Block Effect Similarly, we have F B = 3.362 and to find the P-value we need to the probability of observing a value equal to or larger than this from an F distribution with 4 and 8 degrees of freedom. Using R we find that pf(3.362, df1 = 4, df2 = 8, lower.tail = FALSE) ## [1] 0.0679 Therefore, P(> F B ) = 0.0679, which leads us to fail to reject H 0 - there is not sufficient evidence to suggest that at least one block effect is not 0. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 53 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 54 / 64 Two-way Definition How did blocking change our result? One-way df Sum Sq Mean Sq F value Pr(>F) Treatment 2 1.986 0.993 3.061 0.0843 Residuals 12 3.893 0.324 Total 14 5.879 Randomized Block df Sum Sq Mean Sq F value P(>F) Group 2 1.986 0.993 5.471 0.0318 Block 4 2.441 0.6103 3.362 0.0679 Error 8 1.452 0.1815 Total 14 5.879 From Randomized Block to Two-way All of the approaches we have just learned to handle blocking will also apply in the case where we would like to assess the effect if a second factor on our outcome variable. Instead of examining treatment and block effects we instead examine two treatment effects. None of the procedures or calculations change, only what we call things. Blocking decreases df E, which increases MSE (bad). Blocking also decreases SSE, which decreases MSE (good). Sta 111 (Colin Rundel) Lec 19 June 11, 2014 55 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 56 / 64

Two-way Definition Two-way Example - Moths Two-way Model Example - Spruce Moths When employing two-way we can think of each data point as where y ijk = µ + τ i + β j + ɛ ijk τ i is the effect of level i of factor 1 β j is the effect of level j of factor 2 ɛ ijk is the residual of observation k in block j with treatment i this is exactly the same as the randomized block model except the β j s now refer to the effect of the second factor. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 57 / 64 A scientist is interested in efficacy of various lure types in attracting Spruce moths to a trap. They are also interested in the effect of location of the trap on its efficacy as well. Data to the right reflects the number of moths caught. Factor 1 is the lure type (3 levels) Factor 2 is the location (4 levels) There are 5 observations per condition From Understandable Statistics, 7e Scent Sugar Chemical Top 28 35 32 19 22 29 32 33 16 15 21 18 13 17 20 Middle 39 36 37 12 38 40 42 44 18 25 27 28 21 22 36 Lower 44 42 35 21 17 39 38 31 41 32 29 31 29 37 34 Ground 17 18 22 12 27 25 23 15 14 19 29 16 14 16 1 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 58 / 64 Two-way Example - Moths Two-way Example - Moths Mean caught by Treatment Mean caught by Treatment Ground Lower Middle Top Lure Mean Chemical 19.20 36.00 31.80 23.00 27.50 Scent 17.00 32.80 27.80 21.40 24.75 Sugar 21.00 31.20 33.40 25.60 27.80 Loc Mean 19.07 33.33 31.00 23.33 26.68 Mean caught 20 25 30 35 Chemical Scent Sugar Ground Lower Middle Top Sta 111 (Colin Rundel) Lec 19 June 11, 2014 59 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 60 / 64

Two-way Example - Moths Two-way Example - Moths Example - Spruce Moths - Hypotheses Similar to the randomized block, we have two hypothese to evaluate (one for each factor). Lure effect: H 0 : τ Ch = τ Sc = τ Su, H A : at least one τ is not zero Example - Spruce Moths - Table Lure 0.3859 Location 1981.38 0.0000 Residuals Total 5242.98 Location effect: H 0 : β G = β L = β M = β T, H A : at least one β is not zero Conclusions: Fail to reject H 0 (Lure), there is not sufficient evidence to suggest the different lures have an effect. Reject H 0 (Location), there is sufficient evidence to suggest the locations have an effect. Sta 111 (Colin Rundel) Lec 19 June 11, 2014 61 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 62 / 64 Two-way Blocking vs Additional Factors Two-way Blocking vs Additional Factors Difference between a blocking variable and a factor We have just seen that computationally the two are treated the same when conducting an. What then is the difference? Factors are conditions we impose on the experimental units. Blocking variables are characteristics that the experimental units come with. Example - Lighting A study is designed to test the effect of type of light on exam performance of students. 180 students are randomly assigned to three classrooms: one that is dimly lit, another with yellow lighting, and a third with white fluorescent lighting and given the same exam. What are the factor(s) and/or block(s) for this experiment? What type of would be appropriate? The researcher also believes that light levels might have a different effect on males and females, so wants to make sure both genders are represented equally under the different light conditions. After this modifications what are the factor(s) and/or block(s) for this experiment? What type of would be appropriate? Sta 111 (Colin Rundel) Lec 19 June 11, 2014 63 / 64 Sta 111 (Colin Rundel) Lec 19 June 11, 2014 64 / 64