Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

Similar documents
compare time needed for lexical recognition in healthy adults, patients with Wernicke s aphasia, patients with Broca s aphasia

Analysis of Variance Inf. Stats. ANOVA ANalysis Of VAriance. Analysis of Variance Inf. Stats

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Statistiek II. John Nerbonne. February 26, Dept of Information Science based also on H.Fitz s reworking

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

ANOVA: Analysis of Variation

One-way ANOVA Inference for one-way ANOVA

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 22, 2014

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Example - Alfalfa (11.6.1) Lecture 14 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Wolf River. Lecture 15 - ANOVA. Exploratory analysis. Wolf River - Data. Sta102 / BME102. October 26, 2015

ANOVA Analysis of Variance

Lecture 15 - ANOVA cont.

Week 14 Comparing k(> 2) Populations

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

In ANOVA the response variable is numerical and the explanatory variables are categorical.

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Battery Life. Factory

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Chapter 11 - Lecture 1 Single Factor ANOVA

Statistics for EES Factorial analysis of variance

Analysis of variance

ANOVA: Comparing More Than Two Means

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Analysis of Variance (ANOVA)

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Hypothesis testing: Steps

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

Inference for Regression Inference about the Regression Model and Using the Regression Line

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Analysis of variance (ANOVA) Comparing the means of more than two groups

WELCOME! Lecture 13 Thommy Perlinger

Inference for Regression Simple Linear Regression

Sociology 6Z03 Review II

Two-Way ANOVA. Chapter 15

Analysis of Variance

13: Additional ANOVA Topics

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Data Analysis and Statistical Methods Statistics 651

Simple Linear Regression: One Qualitative IV

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Comparing Several Means

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Analysis of Variance (ANOVA) Cancer Research UK 10 th of May 2018 D.-L. Couturier / R. Nicholls / M. Fernandes

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

4.1. Introduction: Comparing Means

ANOVA CIVL 7012/8012

Hypothesis testing: Steps

Chapter 12 - Lecture 2 Inferences about regression coefficient

Unit 27 One-Way Analysis of Variance

Analysis of Variance

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

An Old Research Question

Chapter 10: Analysis of variance (ANOVA)

16.3 One-Way ANOVA: The Procedure

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Research Methods II MICHAEL BERNSTEIN CS 376

Sampling Distributions: Central Limit Theorem

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

One-way between-subjects ANOVA. Comparing three or more independent means

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Announcements. Statistics 104 (Mine C etinkaya-rundel) Diamonds. From last time.. = 22

1. The (dependent variable) is the variable of interest to be measured in the experiment.

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

ANOVA: Analysis of Variance

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Announcements. Final Review: Units 1-7

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

ANOVA - analysis of variance - used to compare the means of several populations.

Non-parametric tests, part A:

Introduction to the Analysis of Variance (ANOVA)

Lecture 7: Hypothesis Testing and ANOVA

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Chap The McGraw-Hill Companies, Inc. All rights reserved.

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

MATH Notebook 3 Spring 2018

16.400/453J Human Factors Engineering. Design of Experiments II

Transcription:

Statistiek II John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa Dept of Information Science j.nerbonne@rug.nl February 13, 2014

Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic regression. 7 Hierarchical, or mixed models

Lecture outline Today: One-way ANOVA 1 General motivation 2 F -test and F -distribution 3 ANOVA example 4 The logic of ANOVA Short break 5 ANOVA calculations 6 Post-hoc tests

What s ANalysis Of VAriance (ANOVA)? Most popular statistical test for numerical data Generalized t-test Compares means of more than two groups Fairly robust Based on F -distribution compares variances (between groups and within groups) Two basic versions: a One-way (or single) ANOVA: compare groups along one dimension, e.g., grade point average by school class b N-way (or factorial) ANOVA: compare groups along 2 dimensions, e.g., grade point average by school class and gender

Typical applications One-way ANOVA: Compare time needed for lexical recognition in 1. healthy adults 2. patients with Wernicke s aphasia 3. patients with Broca s aphasia Factorial ANOVA: Compare lexical recognition time in male and female in the same three groups

Comparing multiple means For two groups: use t-test Note: testing for p-value of 0.05 shows significance 1 time in 20 if there is no difference in population mean (effect of chance) But suppose there are 7 groups, i.e., we test ( 7 2) = 21 pairs of groups Caution: several tests (on the same data) run the risk of finding significance through sheer chance

Multiple comparison problem Example: Suppose you run k = 3 tests, always seeking a result significant at α = 0.05 probability of getting at least one false positive is given by: α FW = 1 P(zero false positive results) = 1 (1 α) k = 1 (1 0.05) 3 = 1 (0.95) 3 = 0.143 Hence, with only 3 pairwise tests, the chance of committing type I error almost 15% (and 66% for 21 tests!) α FW called Bonferroni family-wise α-level

Bonferroni correction for multiple comparisons To guarantee a family-wise α-level of 0.05, divide α by number of tests. Example: 0.05/3 (= α/# tests) = 0.017 (note: 0.983 3 0.95) set α = 0.017 (= Bonferroni-corrected α-level) If p-value is less than the Bonferroni-corrected target α: reject the null hypothesis. If p-value greater than the Bonferroni-corrected target α: do not reject the null hypothesis.

Analysis of variance ANOVA automatically corrects for looking at several relationships (like Bonferroni correction) Based on F -distribution: Moore & McCabe, 7.3, pp. 435 445 Measures the difference between two variances (variance σ 2 ) F = s2 1 s 2 2 always positive since variances are positive two degrees of freedom interesting, one for s 1, one for s 2

F -test vs. F -distribution F -value: F = s2 1 s 2 2 F -values used in F -test (Fisher s test) H 0 : samples are from same distribution (s 1 = s 2 ) H a : samples are from different distributions (s 1 s 2 ) - value near 1 indicates same variance - value near 0 or + indicates difference in variance F -test very sensitive to deviations from normal ANOVA uses F -distribution, but is different: ANOVA F -test!

F -distribution Critical area for F -distribution at p = 0.05 (df: 12,10) Probability density 0.0 0.2 0.4 0.6 0.8 rejection region acceptance region rejection region 0 1 2 3 4 Note the symmetry: P( s2 1 s 2 2 F value < x) = P( s2 2 s 2 1 (because y < x 1 y > 1 x for x, y R + ) > 1 x )

F -test Example: height group sample mean standard size deviation boys 16 180cm 6cm girls 9 168cm 4cm Is the difference in standard deviation significant? Examine F = s2 boys s 2 girls Degrees of freedom: df boys = 16 1 df girls = 9 1

F -test critical area (for two-tailed test with α = 0.05) P(F (15, 8) > x) = α 2 = 0.025 P(F (15, 8) < x) = 1 0.025 P(F (15, 8) < 4.1) = 0.975 Moore & McCabe, Table E, p. 706 P(F (15, 8) < x) = 0.025 P(F (8, 15) > x ) = 0.025 where x = 1 x P(F (8, 15) > 3.2) = 0.025 (tables) P(F (15, 8) < 1 3.2 ) = 0.025 P(F (15, 8) < 0.31) = 0.025 (no values directly for P(F (df 1, df 2 ) > x)) Reject H 0 if F < 0.31 or F > 4.1 Here, F = 62 4 2 = 2.25 (hence no evidence of difference in distributions)

ANOVA Analysis of Variance (ANOVA) most popular statistical test for numerical data several types - single, one-way - factorial, two-, three-,..., n-way - single/factorial repeated measures examines variation - between-groups gender, age, etc. - within-groups overall automatically corrects for looking at several relationships (like Bonferroni correction) uses F -distribution, where F (n, m) fixes n typically at the number of groups (minus 1), m at the number of subjects, i.e., data points (minus number of groups)

Detailed example: one-way ANOVA Question: Are exam grades of four groups of foreign students Nederlands voor anderstaligen the same? More exactly, are the four averages the same? H 0 : µ 1 = µ 2 = µ 3 = µ 4 H a : µ 1 µ 2 or µ 1 µ 3... or µ 3 µ 4 Alternative hypothesis: at least one group has a different mean For the question of whether any particular pair is different, the t-test is appropriate. For testing whether all language groups are the same, pairwise t-tests exaggerate differences (increase the chance of type I error) We therefore want to apply one-way ANOVA

Data: Dutch proficiency of foreigners Four groups of ten students each: Group Europe America Africa Asia 10 33 26 26 19 21 25 21.... 31 20 15 21 Mean 25.0 21.9 23.1 21.3 Samp. SD 8.14 6.61 5.92 6.90 Samp. Variance 66.22 43.66 34.99 47.57

ANOVA conditions ANOVA assumptions: Normal distribution per subgroup Same variance in subgroups: least SD > one-half of largest SD independent observations: watch out for test-retest situations! Check differences in SD s! (some SPSS computing)

ANOVA conditions Assumption: normal distribution per group, check with normal quantile plot, e.g., for Europeans below (repeat for every group) Normal Q-Q plot of toets.nl voor anderstalige

Visualizing ANOVA data Is there a significant difference in the means (of the groups being contrasted)? Take care that boxplots sketch medians not means.

Sketch of ANOVA Group 1 2 3 4 Eur. Amer. Africa Asia.... x 1j x 2j x 3j x 4j.... x 1 x 2 x 3 x 4 Notation: Group index: i {1, 2, 3, 4} Sample index: j N i = size of group i Data point x ij : ith group, jth observation Number of groups: I = 4 Total mean: x Group mean: x i For any data point x ij : (x ij x) = (x i x) + (x ij x i ) total residue = group diff. + error ANOVA question: does group membership influence the response variable?

Two variances Reminder of high school algebra: (a + b) 2 = a 2 + b 2 + 2ab ab a 2 a b 2 ab b

Two variances Data point x ij : (x ij x) = (x i x) + (x ij x i ) Want sum of squared deviates for each group: (x ij x) 2 = (x i x) 2 + (x ij x i ) 2 + 2(x i x)(x ij x i ) Sum over elements in ith group: N i (x ij x) 2 = j=1 N i N i (x i x) 2 + (x ij x i ) 2 + 2(x i x)(x ij x i ) j=1 j=1 j=1 N i

Two variances Note that this term must be zero: Because: N i 2(x i x)(x ij x i ) j=1 (a) (b) N i 2(x i x)(x ij x i ) = 2(x i x) (x ij x i ) j=1 N i (x ij x i ) = 0 x i = j=1 N i j=1 } {{ } Ni j=1 x ij 0 N i

Two variances So we have: N i j=1 (x ij x) 2 = N i N i (x i x) 2 + (x ij x i ) 2 j=1 j=1 N i (+ 2(x i x)(x ij x i ) = 0) j=1 Therefore: N i j=1 (x ij x) 2 = N i N i (x i x) 2 + (x ij x i ) 2 j=1 j=1 And finally we can sum over all groups: I N i I N i I N i (x ij x) 2 = (x i x) 2 + (x ij x i ) 2 i=1 j=1 i=1 j=1 i=1 j=1

ANOVA terminology (x ij x) = (x i x) + (x ij x i ) total residue = group diff. + error I N i I I N i (x ij x) 2 = N i (x i x) 2 + (x ij x i ) 2 i=1 j=1 i=1 i=1 j=1 SST SSG SSE Total Sum of Squares = Group Sum of Squares + Error Sum of Squares (n 1) = (I 1) + (n I ) DFT DFG DFE Total Degrees of Freedom = Group Degrees of Freedom + Error Degrees of Freedom

Variances are mean squared differences to the mean Note that SST/DFT: I Ni i=1 j=1 (x ij x) 2 n 1 is a variance, and likewise SSG/DFG: labelled MSG ( Mean square between groups ), and SSE/DFE: labelled MSE ( Mean square error or sometimes Mean square within groups ) In ANOVA, we compare MSG (variance between groups) and MSE (variance within groups), i.e. we measure F = MSG MSE If this F -value is large, differences between groups overshadow differences within groups.

Two variances 1) Estimate the pooled variance of the population (MSE): MSE = DFE SSE I Ni = i=1 j=1 (x ij x i ) 2 n I equiv = I i=1 DF i s 2 i I i=1 DF i In our example (Nederlands for anderstaligen): I i=1 DF i s 2 i I i=1 DF i = (N1 1)s2 1 + (N 3 1)s 2 2 + (N 3 1)s 2 3 + (N 4 1)s 2 4 (N 1 1) + (N 3 1) + (N 3 1) + (N 4 1) = = 9 66.22 + 9 43.66 + 9 34.99 + 9 47.57 9 + 9 + 9 + 9 595.98 + 392.94 + 314.91 + 428.13 = 48.11 36 Estimates the variance in groups (using DF), aka within-groups estimate of variance

Two variances 2) Estimate the between-groups variance of the population (MSG): MSG = DFG SSG = I i=1 N i (x i x) 2 I 1 In our example (Nederlands for anderstaligen): We had 4 group means: 25.0, 21.9, 23.1, 21.3, grand mean: 22.8 MSG = 10 ((25 22.8)2 +(21.9 22.8) 2 +(23.1 22.8) 2 +(21.3 22.8) 2 ) 4 1 = 26.6 The between-groups variance (MSG) is an aggregate estimate of the degree to which the four sample means differ from one another

Interpreting estimates with F -scores If H 0 is true, then we have two variances: Between-groups estimate: s 2 bg = 26.62 Within-groups estimate: s 2 wg = 48.11 and and their ratio s 2 bg s 2 wg follows an F -distribution with: (# groups 1) = 3 degrees of freedom for s bg 2 and (# observations # groups) = 36 degrees of freedom for swg 2 In our example: F (3, 36) = 26.62 48.11 = 0.55 P(F (3, 40) > 2.84) = 0.05 (see tables), so there is no evidence of non-uniform behavior

SPSS summary No evidence of non-uniform behavior

Other questions ANOVA H 0 : µ 1 = µ 2 =... = µ n But sometimes particular contrasts are important e.g., are Europeans better (in learning Dutch)? Distinguish (in reporting results): prior contrasts questions asked before data is collected and analyzed post hoc (posterior) questions questions asked after data collection and analysis data-snooping is exploratory, cannot contribute to hypothesis testing

Prior contrasts Questions asked before data collection and analysis e.g., are Europeans better (in learning Dutch)? Another way of putting this: H 0 : µ Eur = 1 3 (µ Am + µ Afr + µ Asia ) H a : µ Eur 1 3 (µ Am + µ Afr + µ Asia ) Reformulation (SPSS requires this): H 0 : 0 = µ Eur + 0.33µ Am + 0.33µ Afr + 0.33µ Asia

Prior contrasts in SPSS Mean of every group gets a coefficient Sum of coefficients is 0 A t-test is carried out and two-tailed p-value is reported (as usual): No significant difference here (of course) Note: prior contrasts are legitimate as hypothesis tests as long as they are formulated before data collection and analysis

Post-hoc questions Assume H 0 is rejected: which means are distinct? Data-snooping problem: in a large set, some distinctions are likely to be statistically significant But we can still look (we just cannot claim to have tested the hypothesis) We are asking whether m i m j is significantly larger, we apply a variant of the t-test The relevant sd is MSE n (differences among scores), but there is a correction since we re looking at a proportion of the scores in any one comparison

SD in post-hoc ANOVA questions Standard deviation (among differences in groups i and j): sd = t = MSE N i +N j N = 48.1 10+10 40 = 4.9 x i x j sd 1 N i + 1 N j The critical t-value is calculated as p c where p is the desired significance level and c is the number of comparisons. For pairwise comparisons: c = ( ) I 2

Post-hoc questions in SPSS SPSS post-hoc Bonferroni -searches among all groupings for statistically significant ones But in this case there are none (of course)

How to win at ANOVA Note the ways in which the F -ratio increases (i.e., becomes more significant): F = MSG MSE 1. MSG increases: differences in means between groups grow larger 2. MSE decreases: overall variation within groups grows smaller

Two models for grouped data x ij = µ + ɛ ij x ij = µ + α i + ɛ ij First model: no group effect each data point represents error (ɛ) around a mean (µ) Second model: real group effect each data point represents error (ɛ) around an overall mean (µ), combined with a group adjustment (α i ) ANOVA asks: is there sufficient evidence for α i?

Fall backs? Suppose some cells are non-normal, or some standard deviations too large

Fall backs? Suppose some cells are non-normal, or some standard deviations too large Kruskal-Wallis, non-parametric comparison of > 2 medians; apply (monotonic) transformation to reduce SD, perhaps improve fit to normality; trim most extreme 1% (or 5%) of data Always report transformations or trimming!

Next week Next week: factorial ANOVA