Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Similar documents
Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

Announcements. Statistics 104 (Mine C etinkaya-rundel) Diamonds. From last time.. = 22

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example - Alfalfa (11.6.1) Lecture 14 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Sociology 6Z03 Review II

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

ANOVA: Comparing More Than Two Means

Lecture 15 - ANOVA cont.

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Comparing Several Means

ANOVA: Analysis of Variation

ANOVA CIVL 7012/8012

1 Introduction to One-way ANOVA

Lecture 6 Multiple Linear Regression, cont.

Chapter 11 - Lecture 1 Single Factor ANOVA

Variance Decomposition and Goodness of Fit

Regression Models. Chapter 4. Introduction. Introduction. Introduction

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Annoucements. MT2 - Review. one variable. two variables

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Chapter 4: Regression Models

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Announcements. Final Review: Units 1-7

Regression Analysis II

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Chapter 11 - Lecture 1 Single Factor ANOVA

Unit 27 One-Way Analysis of Variance

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Statistics and Quantitative Analysis U4320

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Two Sample Problems. Two sample problems

Chap The McGraw-Hill Companies, Inc. All rights reserved.

df=degrees of freedom = n - 1

Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

ANOVA Analysis of Variance

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

One-way ANOVA Inference for one-way ANOVA

IEOR165 Discussion Week 12

We need to define some concepts that are used in experiments.

Analysis of Variance

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

STAT 115:Experimental Designs

Lecture 5: ANOVA and Correlation

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

STAT Chapter 10: Analysis of Variance

Week 14 Comparing k(> 2) Populations

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Chapter 10: Analysis of variance (ANOVA)

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

16.3 One-Way ANOVA: The Procedure

One-way between-subjects ANOVA. Comparing three or more independent means

ANOVA: Analysis of Variance

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Chapter 4. Regression Models. Learning Objectives

Lecture 20: Multiple linear regression

Simple Linear Regression: One Qualitative IV

Analysis of Variance Inf. Stats. ANOVA ANalysis Of VAriance. Analysis of Variance Inf. Stats

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lecture 19: Inference for SLR & Transformations

One-way between-subjects ANOVA. Comparing three or more independent means

Analysis of variance

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Project proposal feedback. Unit 4: Inference for numerical variables Lecture 3: t-distribution. Statistics 104

Difference in two or more average scores in different groups

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Chapter 7 Student Lecture Notes 7-1

Chapter 10. Design of Experiments and Analysis of Variance

Inferences for Regression

compare time needed for lexical recognition in healthy adults, patients with Wernicke s aphasia, patients with Broca s aphasia

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Chapter 3 Multiple Regression Complete Example

Hypotheses. Poll. (a) H 0 : µ 6th = µ 13th H A : µ 6th µ 13th (b) H 0 : p 6th = p 13th H A : p 6th p 13th (c) H 0 : µ diff = 0

Simple Linear Regression

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Inference for Regression Simple Linear Regression

Battery Life. Factory

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

ANOVA: Analysis of Variance

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

10/31/2012. One-Way ANOVA F-test

ECO220Y Simple Regression: Testing the Slope

Lecture 9: Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Transcription:

Announcements Announcements Unit 4: Inference for numerical variables Lecture 4: Statistics 104 Go to Sakai s to pick a time for a one-on-one meeting. Mine Çetinkaya-Rundel June 6, 2013 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 2 / 35 Data Aldrin concentration (nanograms per liter) at three levels of depth. The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. The standard methods to test whether these substances are present in a river is to take samples at six-tenths depth. But since these compounds are denser than water and their molecules tend to stick to particles of sediment, they are more likely to be found in higher concentrations near the bottom than near mid-depth. Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 3 / 35 aldrin depth 1 3.80 bottom 2 4.80 bottom... 10 8.80 bottom 11 3.20 middepth 12 3.80 middepth... 20 6.60 middepth 21 3.10 surface 22 3.60 surface... 30 5.20 surface Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 4 / 35

Exploratory analysis Aldrin concentration (nanograms per liter) at three levels of depth. bottom middepth surface 3 4 5 6 7 8 9 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.1 0 1.37 Which of the following plots shows groups with means that are most and least likely to be significantly different from each other? 5 10 15 20 25 30 35 (a) most: I, least: II (b) most: II, least: III (c) most: I, least: III (d) most: III, least: II (e) most: II, least: I I 5 0 5 10 15 20 II 5 0 5 10 15 20 25 III Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 5 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 6 / 35 Research question Is there a difference between the mean aldrin concentrations among the three levels? To compare means of 2 groups we use a Z or a T statistic. To compare means of 3+ groups we use a new test called and a new statistic called F. is used to assess whether the mean of the outcome variable is different for different levels of a categorical variable. H 0 : The mean outcome is the same across all categories, µ 1 = µ 2 = = µ k, where µ i represents the mean of the outcome for observations in category i. H A : At least one mean is different than others. Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 7 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 8 / 35

Conditions z/t test vs. - Purpose 1 The observations should be independent within and between groups If the data are a simple random sample from less than 10% of the population, this condition is satisfied. Carefully consider whether the data may be independent (e.g. no pairing). Always important, but sometimes difficult to check. 2 The observations within each group should be nearly normal. Especially important when the sample sizes are small. How do we check for normality? 3 The variability across the groups should be about equal. Especially important when the sample sizes differ between groups. How can we check this condition? z/t test Compare means from two groups to see whether they are so far apart that the observed difference cannot reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 Compare the means from two or more groups to see whether they are so far apart that the observed differences cannot all reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 = = µ k Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 9 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 10 / 35 z/t test vs. - Method z/t test vs. z/t test Compute a test statistic (a ratio). z/t = ( x 1 x 2 ) (µ 1 µ 2 ) SE( x 1 x 2 ) Compute a test statistic (a ratio). F = Large test statistics lead to small p-values. variability bet. groups variability w/in groups If the p-value is small enough H 0 is rejected, and we conclude that the population means are not equal. With only two groups t-test and are equivalent, but only if we use a pooled standard variance in the denominator of the test statistic. With more than two groups, compares the sample means to an overall grand mean. Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 11 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 12 / 35

and the F test Hypotheses Test statistic What are the correct hypotheses for testing for a difference between the mean aldrin concentrations among the three levels? (a) H 0 : µ B = µ M = µ S H A : µ B µ M µ S Does there appear to be a lot of variability within groups? How about between groups? F = variability bet. groups variability w/in groups (b) H 0 : µ B µ M µ S H A : µ B = µ M = µ S (c) H 0 : µ B = µ M = µ S H A : At least one mean is different. (d) H 0 : µ B = µ M = µ S = 0 H A : At least one mean is different. (e) H 0 : µ B = µ M = µ S H A : µ B > µ M > µ S Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 13 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 14 / 35 F distribution and p-value F = and the F test variability bet. groups variability w/in groups In order to be able to reject H 0, we need a small p-value, which requires a large F statistic. In order to obtain a large F statistic, variability between sample means needs to be greater than variability within sample means. Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 15 / 35 output, deconstructed Sum of squares total, SST Measures the variability between groups n SST = (x i x) 2 i=1 where x i represent the value of the response variable of each observation in the dataset. [Very similar to calculation of variance, except not scaled by the sample size.] SST = (3.8 5.1) 2 + (4.8 5.1) 2 + (4.9 5.1) 2 + + (5.2 5.1) 2 = ( 1.3) 2 + ( 0.3) 2 + ( 0.2) 2 + + (0.1) 2 = 1.69 + 0.09 + 0.04 + + 0.01 = 54.29 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 16 / 35

output, deconstructed output, deconstructed Sum of squares between groups, SSG Measures the variability between groups k SSG = n i ( x i x) 2 i=1 Sum of squares error, SSE Measures the variability within groups: SSE = SST SSG where n i is each group size, x i is the average for each group, x is the overall (grand) mean. [Explained variability: ndeviation mean of group mean from overall mean, weighted by sample size.] bottom 10 6.04 middepth 10 5.05 surface 10 4.2 overall 30 5.1 [Unexplained variability] SSE = 54.29 16.96 = 37.33 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 17 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 18 / 35 output, deconstructed output, deconstructed now we need a way to get from these measures of total variability to average variability (scaling by a factor that incorporates sample sizes and number of groups degrees of freedom) Degrees of freedom associated with groups: df G = k 1, where k is the number of groups total: df T = n 1, where n is the total sample size error: df E = df T df G df G = k 1 = 3 1 = 2 df T = n 1 = 30 1 = 29 df E = 29 2 = 27 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 19 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 20 / 35

output, deconstructed output, deconstructed Mean square error Mean square error is calculated as sum of squares divided by the degrees of freedom. MSG = 16.96/2 = 8.48 MSE = 37.33/27 = 1.38 (Group) depth 2 16.96 8.48 6.14 0.0063 Test statistic, F value As we discussed before, the F statistic is the ratio of the between group and within group variability. F = MSG MSE F = 8.48 1.38 = 6.14 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 21 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 22 / 35 output, deconstructed (Group) depth 2 16.96 8.48 6.14 0.0063 Conclusion - in context output, deconstructed > pf(6.14,2,27, lower.tail = FALSE) [1] 0.006340096 p-value p-value is the probability of at least as large a ratio between the between group and within group variability, if in fact the means of all groups are equal. It s calculated as the area under the F curve, with degrees of freedom df G and df E, above the observed F statistic. dfg = 2 ; dfe = 27 What is the conclusion of the hypothesis test? The data provide convincing evidence that the average aldrin concentration (a) is different for all groups. (b) on the surface is lower than the other levels. (c) is different for at least one group. (d) is the same for all groups. 0 6.14 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 23 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 24 / 35

output, deconstructed Checking conditions Conclusion (1) independence If p-value is small (less than α), reject H 0. The data provide convincing evidence that at least one mean is different from (but we can t tell which one). If p-value is large, fail to reject H 0. The data do not provide convincing evidence that at least one pair of means are different from each other, the observed differences in sample means are attributable to sampling variability (or chance). Does this condition appear to be satisfied? Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 25 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 26 / 35 Checking conditions Checking conditions (2) approximately normal (3) constant variance Does this condition appear to be satisfied? Does this condition appear to be satisfied? bottom 9 8 7 6 5 4 1.5 1.0 0.5 0.0 0.5 1.0 1.5 middepth 6.5 6.0 5.5 5.0 4.5 4.0 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 surface 5.0 4.5 4.0 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 9 8 7 6 3 2 4 5 2 3 4 1 0 3 5 7 9 1 0 3 5 7 2 1 0 2.5 4.0 5.5 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 27 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 28 / 35

Which means differ? Multiple comparisons Earlier we concluded that at least one pair of means differ. The natural question that follows is which ones? We can do two sample t tests for differences in each possible pair of groups. Can you see any pitfalls with this approach? When we run too many tests, the Type 1 Error rate increases. This issue is resolved by using a modified significance level. The scenario of testing many pairs of groups is called multiple comparisons. The Bonferroni correction suggests that a more stringent significance level is more appropriate for these tests: α = α/k where K is the number of comparisons being considered. If there are k groups, then usually all possible pairs are compared and K = k(k 1) 2. Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 29 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 30 / 35 Determining the modified α Which means differ? In the aldrin data set depth has 3 levels: bottom, mid-depth, and surface. If α = 0.05, what should be the modified significance level for two sample t tests for determining which pairs of groups have significantly different means? (a) α = 0.05 (b) α = 0.05/2 = 0.025 (c) α = 0.05/3 = 0.0167 (d) α = 0.05/6 = 0.0083 Based on the box plots below, which means would you expect to be significantly different? 9 (a) bottom & surface 8 7 6 5 4 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 (b) bottom & mid-depth (c) mid-depth & surface (d) bottom & mid-depth; mid-depth & surface (e) bottom & mid-depth; bottom & surface; mid-depth & surface Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 31 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 32 / 35

Which means differ? (cont.) Is there a difference between the average aldrin concentration at the bottom and at mid depth? If the assumption of equal variability across groups is satisfied, we can use the data from all groups to estimate variability: Estimate any within-group standard deviation with MSE, which is s pooled Use the error degrees of freedom, n k, for t-distributions Difference in two means: after σ 2 1 SE = + σ2 2 MSE n 1 n 2 n 1 + MSE n 2 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.2 0.66 overall 30 5.1 1.37 depth 2 16.96 8.48 6.13 0.0063 Residuals 27 37.33 1.38 T dfe = ( x bottom x middepth ) MSE n bottom + MSE n middepth T 27 = (6.04 5.05) 1.38 10 + 1.38 10 = 0.99 0.53 = 1.87 0.05 < p value < 0.10 (two-sided) α = 0.05/3 = 0.0167 Fail to reject H 0, the data do not provide convincing evidence of a difference between the average aldrin concentrations at bottom and mid depth. Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 33 / 35 Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 34 / 35 Is there evidence of a difference between the average aldrin concentration at the bottom and at surface? (a) yes (b) no Statistics 104 (Mine Çetinkaya-Rundel) U4 - L4: June 6, 2013 35 / 35