Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

Similar documents
Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Announcements. Statistics 104 (Mine C etinkaya-rundel) Diamonds. From last time.. = 22

Example - Alfalfa (11.6.1) Lecture 14 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Lecture 16 - Correlation and Regression

Lecture 15 - ANOVA cont.

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

ANOVA CIVL 7012/8012

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Comparing Several Means

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

ANOVA: Comparing More Than Two Means

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

ANOVA: Analysis of Variation

Unit 27 One-Way Analysis of Variance

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Chapter 11 - Lecture 1 Single Factor ANOVA

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Variance Decomposition and Goodness of Fit

Sociology 6Z03 Review II

Regression Analysis II

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 11 - Lecture 1 Single Factor ANOVA

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lecture 6 Multiple Linear Regression, cont.

STA 101 Final Review

1 Introduction to One-way ANOVA

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

compare time needed for lexical recognition in healthy adults, patients with Wernicke s aphasia, patients with Broca s aphasia

Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran

ANOVA Analysis of Variance

Simple Linear Regression: One Qualitative IV

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

STAT Chapter 10: Analysis of Variance

Ch. 11 Inference for Distributions of Categorical Data

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

WELCOME! Lecture 13 Thommy Perlinger

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

BNAD 276 Lecture 10 Simple Linear Regression Model

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Chapter 4. Regression Models. Learning Objectives

df=degrees of freedom = n - 1

One-way between-subjects ANOVA. Comparing three or more independent means

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Week 14 Comparing k(> 2) Populations

Lecture 11: Simple Linear Regression

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Chapter 4: Regression Models

ECO220Y Simple Regression: Testing the Slope

Analysis of Variance Inf. Stats. ANOVA ANalysis Of VAriance. Analysis of Variance Inf. Stats

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Statistics and Quantitative Analysis U4320

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Mathematics for Economics MA course

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Linear models and their mathematical foundations: Simple linear regression

Hypothesis Testing hypothesis testing approach

Power Analysis. Introduction to Power

Statistiek II. John Nerbonne. February 26, Dept of Information Science based also on H.Fitz s reworking

Categorical Predictor Variables

IEOR165 Discussion Week 12

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Statistical Inference with Regression Analysis

Battery Life. Factory

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Inference for Regression Simple Linear Regression

Ch 2: Simple Linear Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Regression Models. Chapter 4. Introduction. Introduction. Introduction

ANOVA: Analysis of Variance

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

CRP 272 Introduction To Regression Analysis

One-way ANOVA Inference for one-way ANOVA

Lecture 24: Partial correlation, multiple regression, and correlation

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Lab #12: Exam 3 Review Key

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

One-sample categorical data: approximate inference

ANOVA: Analysis of Variance

Lecture 9: Linear Regression

Transcription:

Wolf River Lecture 15 - Sta102 / BME102 Colin Rundel October 22, 2014 The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. These compounds are denser than water and their molecules tend to become stuck in sediment, and are more likely to be found in higher concentrations near the bottom than near mid-depth. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 2 / 39 Wolf River - Data Aldrin concentration (nanograms per liter) at three levels of depth. Exploratory analysis Aldrin concentration (nanograms per liter) at three levels of depth. aldrin depth 1 3.80 bottom 2 4.80 bottom... 10 8.80 bottom 11 3.20 middepth 12 3.80 middepth... 20 6.60 middepth 21 3.10 surface 22 3.60 surface... 30 5.20 surface bottom middepth surface 3 4 5 6 7 8 9 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.10 1.37 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 3 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 4 / 39

Research question Is there a difference between the mean aldrin concentrations among the three levels? To compare means of 2 groups we use a Z or a T statistic. To compare means of 3 or more groups we use a new test called (analysis of variance) and a new test statistic, F. is used to assess whether the mean of the outcome variable is different for different levels of a categorical variable. H 0 : The mean outcome is the same across all categories, µ 1 = µ 2 = = µ k, where µ i represents the mean of the outcome for observations in category i. H A : At least one pair of means are different. Note - this hypothesis test does not tell us if all the means are different or only if one pair is different, more on how to do that later. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 5 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 6 / 39 Checking conditions Conditions 1 The observations should be independent within and between groups If the data are a simple random sample from less than 10% of the population, this condition is satisfied. Carefully consider whether the data may be independent (e.g. no pairing). Always important, but sometimes difficult to check. 2 The observations within each group should be nearly normal. Particularly important when the sample sizes are small. (1) Independence Does this condition appear to be satisfied for the Wolf River data? How do we check for normality? 3 The variability across the groups should be about equal. Particularly important when the sample sizes differ between groups. How can we check this condition? Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 7 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 8 / 39

Checking conditions Checking conditions (2) Approximately normal Does this condition appear to be satisfied? (3) Constant variance Does this condition appear to be satisfied? 3 bottom 2 middepth 4 surface 9 Data 2 3 8 1 0 3 5 7 9 9 8 7 6 5 4 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1 0 3 5 7 6.5 6.0 5.5 5.0 4.5 4.0 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2 1 0 2.5 4.0 5.5 5.0 4.5 4.0 3.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 7 6 5 4 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 In this case it is somewhat hard to tell since the means are different. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 9 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 10 / 39 Checking conditions Comparison (3) Constant variance - Residuals z/t test vs. - Purpose One of the ways to think about each data point is as follows: y ij = µ i + ɛ ij where ɛ ij is called the residual (ɛ ij = y ij µ i ). 2 1 0 1 Residuals z/t test Compare means from two groups to see whether they are so far apart that the observed difference cannot reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 Compare the means from two or more groups to see whether they are so far apart that the observed differences cannot all reasonably be attributed to sampling variability. H 0 : µ 1 = µ 2 = = µ k 2 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 11 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 12 / 39

Comparison Comparison z/t test vs. - Method z/t test vs. z/t test Compute a test statistic (a ratio). z/t = ( x 1 x 2 ) (µ 1 µ 2 ) SE( x 1 x 2 ) Compute a test statistic (a ratio). F = variability btw. groups variability w/in groups With only two groups t-test and are equivalent, but only if we use a pooled standard variance in the denominator of the test statistic. With more than two groups, compares the sample means to an overall grand mean. Large test statistics lead to small p-values. If the p-value is small enough H 0 is rejected, and we conclude that the population means are not equal. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 13 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 14 / 39 and the F test and the F test Test statistic Does there appear to be a lot of variability within groups? How about between groups? F = variability btw. groups variability w/in groups F distribution and p-value F = variability btw. groups variability w/in groups In order to be able to reject H 0, we need a small p-value, which requires a large F statistic. In order to obtain a large F statistic, variability between sample means needs to be greater than variability within sample means. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 15 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 16 / 39

and the F test and the F test Types of Variability For we think of our variability (uncertainty) in terms of three separate quantities: Total variability - all of the variability in the data, ignoring any explanatory variable(s). (You can think of this as being analogous to the sample variance of all the data) Group variability - variability between the group means and the grand mean. Error variability - the sum of the variability within each group. (You can think of this as being analogous to the sum of sample variances for each group or the sum of the variances of the residuals) Sum of squares and Variability Mathematically, we can think of the unnormalized measures of variability as follows: Total variability - k n i (y ij µ ) 2 i=1 j=1 Group variability - k n i (µ i µ ) 2 = Error variability - i=1 j=1 k n i (y ij µ i ) 2 i=1 j=1 k n i (µ i µ ) 2 i Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 17 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 18 / 39 and the F test and the F test Partitioning Sums of Squares With a little bit of careful algebra we can show that: k n i (y ij µ ) 2 = i=1 j=1 k k n i n i (µ i µ ) 2 + (y ij µ i ) 2 i i=1 j=1 Sum of Squares Total = Sum of Squares Group + Sum of Squares Error Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 19 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 20 / 39

output, deconstructed output, deconstructed output (Group) depth 2 16.96 8.48 6.13 0.0063 Sum of squares between groups, SSG Measures the variability between groups SSG = k n i ( x i x) 2 i=1 where n i is each group size, x i is the average for each group, x is the overall (grand) mean. n mean bottom 10 6.04 middepth 10 5.05 surface 10 4.2 overall 30 5.1 SSG = ( 10 (6.04 5.1) 2) + ( 10 (5.05 5.1) 2) + ( 10 (4.2 5.1) 2) =16.96 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 21 / 39 output (cont.) - SST (Group) depth 2 16.96 8.48 6.13 0.0063 Sum of squares total, SST Measures the variability between groups n SST = (x i x) 2 i=1 where x i represent each observation in the dataset. SST = (3.8 5.1) 2 + (4.8 5.1) 2 + (4.9 5.1) 2 + + (5.2 5.1) 2 = ( 1.3) 2 + ( 0.3) 2 + ( 0.2) 2 + + (0.1) 2 = 1.69 + 0.09 + 0.04 + + 0.01 = 54.29 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 22 / 39 output, deconstructed output, deconstructed output (cont.) - SSE (Group) depth 2 16.96 8.48 6.13 0.0063 Sum of squares error, SSE Measures the variability within groups: SSE = SST SSG SSE = 54.29 16.96 = 37.33 output (Group) depth 2 16.96 8.48 6.13 0.0063 Degrees of freedom associated with groups: df G = k 1, where k is the number of groups total: df T = n 1, where n is the total sample size error: df E = df T df G df G = k 1 = 3 1 = 2 df T = n 1 = 30 1 = 29 df E = 29 2 = 27 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 23 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 24 / 39

output, deconstructed output, deconstructed output (cont.) - MS output (cont.) - F (Group) depth 2 16.96 8.48 6.13 0.0063 (Group) depth 2 16.96 8.48 6.14 0.0063 Mean square Mean square is calculated as sum of squares divided by the degrees of freedom. MSG = 16.96/2 = 8.48 MSE = 37.33/27 = 1.38 Test statistic, F value As we discussed before, the F statistic is the ratio of the between group and within group variability. F = MSG MSE F = 8.48 1.38 = 6.14 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 25 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 26 / 39 output, deconstructed output, deconstructed output (cont.) - P-value Conclusion - in context P-value (Group) depth 2 16.96 8.48 6.14 0.0063 The probability of at least as large a ratio between the between group and within group variability, if in fact the means of all groups are equal. It s calculated as the area under the F curve, with degrees of freedom df G and df E, above the observed F statistic. dfg = 2 ; dfe = 27 0 6.14 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 27 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 28 / 39

output, deconstructed Conclusion Which means differ? If p-value is small (less than α), reject H 0. The data provide convincing evidence that at least one mean is different from (but we can t tell which one). If p-value is large, fail to reject H 0. The data do not provide convincing evidence that at least one pair of means are different from each other, the observed differences in sample means are attributable to sampling variability (or chance). What is the conclusion of the hypothesis test for Wolf river? We ve concluded that at least one pair of means differ. The natural question that follows is which ones? We can do two sample t tests for differences in each possible pair of groups. Can you see any pitfalls with this approach? When we run too many tests, the Type 1 Error rate increases. This issue is resolved by using a modified significance level. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 29 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 30 / 39 Determining the modified α The scenario of testing many pairs of groups is called multiple comparisons. If there are k groups, then there are K = ( ) k 2 = k(k 1) 2 possible pairs. One common approach is the Bonferroni correction that uses a more stringent significance level for each test: In the aldrin data set depth has 3 levels: bottom, mid-depth, and surface. If α = 0.05, what should be the modified significance level or two sample t tests for determining which pairs of groups have significantly different means? α = α/k where K is the number of comparisons being considered. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 31 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 32 / 39

Which means differ? Which means differ? (cont.) Based on the box plots below, which means would you expect to be significantly different? 9 (a) bottom & surface 8 7 6 5 4 (b) bottom & mid-depth (c) mid-depth & surface (d) bottom & mid-depth; mid-depth & surface If the assumption of equal variability across groups is satisfied, we can use the data from all groups to estimate variability: Estimate any within-group standard deviation with MSE, which is s pooled Use the error degrees of freedom, n k, for t-distributions Difference in two means: after σ1 2 SE = + σ2 2 MSE n 1 n 2 n 1 + MSE n 2 3 bottom sd=1.58 middepth sd=1.10 surface sd=0.66 (e) bottom & mid-depth; bottom & surface; mid-depth & surface Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 33 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 34 / 39 Is there a difference between the average aldrin concentration at the bottom and at mid depth? Is there a difference between the average aldrin concentration at the bottom and at surface? depth 2 16.96 8.48 6.13 0.0063 Residuals 27 37.33 1.38 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.2 0.66 overall 30 5.1 1.37 T dfe = T 27 = ( x b x m ) MSE n b + MSE n m (6.04 5.05) 1.38 + 1.38 10 10 = 0.99 0.53 = 1.87 0.05 < p value < 0.10 (two-sided) α = 0.05/3 = 0.0167 Fail to reject H 0, the data do not provide convincing evidence of a difference between the average aldrin concentrations at bottom and mid depth. Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 35 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 36 / 39

Practice Problem GSS - Hours worked vs Education Previously we have seen data from the General Social Survey in order to compare the average number of hours worked per week by US residents with and without a college degree. However, this analysis didn t take advantage of the original data which contained more accurate information on educational attainment (less than high school, high school, junior college, Bachelor s, and graduate school). Practice Problem GSS - Hours worked vs Education (data) Educational attainment Less than HS HS Jr Coll Bachelor s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 Using, we can consider educational attainment levels for all 1,172 respondents at once instead of re-categorizing them into two groups. On the following slide are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis. Hours worked per week 80 60 40 20 0 Less than HS HS Jr Coll Bachelor's Graduate Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 37 / 39 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 38 / 39 Practice Problem GSS - Hours worked vs Education ( table) Given what we know, fill in the unknowns in the table below. degree?????? 501.54??? 0.0682 Residuals??? 267,382??? Total?????? Educational attainment Less than HS HS Jr Coll Bachelor s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 Sta102 / BME102 (Colin Rundel) Lec 15 October 22, 2014 39 / 39