Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Similar documents
Turning a research question into a statistical question.

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Review of Statistics 101

Introduction to Statistical Analysis

Exam details. Final Review Session. Things to Review

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Intro to Parametric & Nonparametric Statistics

Textbook Examples of. SPSS Procedure

Generalized Linear Models for Non-Normal Data

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Small n, σ known or unknown, underlying nongaussian

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

My data doesn t look like that..

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Workshop Research Methods and Statistical Analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Types of Statistical Tests DR. MIKE MARRAPODI

Lecture 7: Hypothesis Testing and ANOVA

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Inferences About the Difference Between Two Means

Contents. Acknowledgments. xix

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

SPSS Guide For MMI 409

Formulas and Tables by Mario F. Triola

One-way ANOVA Model Assumptions

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Business Statistics. Lecture 10: Course Review

Experimental Design and Data Analysis for Biologists

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

3 Joint Distributions 71

Topic 23: Diagnostics and Remedies

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

Analysis of variance

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Glossary for the Triola Statistics Series

Unit 14: Nonparametric Statistical Methods

Stat 587: Key points and formulae Week 15

A Re-Introduction to General Linear Models (GLM)

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Analyses of Variance. Block 2b

Basic Statistical Analysis

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

CDA Chapter 3 part II

Intuitive Biostatistics: Choosing a statistical test

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

This gives us an upper and lower bound that capture our population mean.

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Non-parametric (Distribution-free) approaches p188 CN

Business Statistics. Lecture 9: Simple Regression

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Module 9: Nonparametric Statistics Statistics (OA3102)

Comparing Several Means: ANOVA

What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone

LOOKING FOR RELATIONSHIPS

Binary Logistic Regression

Chapter 1 Statistical Inference

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Sociology 6Z03 Review II

Physics 509: Non-Parametric Statistics and Correlation Testing

Non-Parametric Statistics: When Normal Isn t Good Enough"

Introduction to Statistics with GraphPad Prism 7

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Review of Multiple Regression

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Data analysis and Geostatistics - lecture VII

Introduction to Within-Person Analysis and RM ANOVA

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Comparing IRT with Other Models

Correlation and Regression

GS Analysis of Microarray Data

Statistical. Psychology

4.1. Introduction: Comparing Means

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Density Temp vs Ratio. temp

Multiple Regression Analysis

Introduction to Statistical Data Analysis III

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Ch. 16: Correlation and Regression

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

The Flight of the Space Shuttle Challenger

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Week 7.1--IES 612-STA STA doc

Model Estimation Example

Multiple Comparisons

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Transcription:

Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1

CATEGORICAL IV, NUMERIC DV 2

Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 3

Is your data normal? Skewness: asymmetry Kurtosis: peakedness rel. to normal Both: within +- 2SE(s/u) is OK Or use Shapiro-Wilk (null = normal) Or look at Q-Q plot 4

T-test Already talked about Assumptions: normality, equal variances, independent samples Can use Levene to test equal variance assumption Post-test: check residuals for assumption fit For a t-test this is the same pre or post For other tests you check residual vs. fit post 5

One way ANOVA H0: m 1 = m 2 = m 3 H1: at least one doesn t match NOT H1: m 1!= m 2!= m 3 Assumptions: normality, common variance, independent errors Intuition: F statistic Variance between / Variance within Under (exact null), F=1; F >> 1 rejects null 6

One-way ANOVA F = MS b / MS w MSw = sum [sum[ (diff from mean) 2 ]] / df w df w = N-k, where k = number of conditions Sum over all conditions; sum per condition MS b = sum [(diff from grand mean) 2] / df b df b = k-1 Every observation goes in the sum 7

(example from Vibha Sazawal) 8

9

F-distribution rejected 10

Now what? (Contrasts) So we rejected the null. What did we learn? What *didn t* we learn? At least one is different... Which? All? This is called an omnibus test To answer our actual research question, we usually need pairwise contrasts 11

The trouble with contrasts Contrasts mess with your Type I bounds One test: 95% confident Three tests: 85.7% confident 5 conditions, all pairs: 4 + 3 + 2 + 1 = 10 tests: 59.9% UH OH 12

Planned vs. post hoc Planned: You have a theory. Really, no cheating You get n-1 pairwise comparisons for free In theory, should not be control vs. all, but prob. OK NO COMPARISONS unless omnibus passes Post-hoc Anything unplanned More than n-1 Requires correction! Doesn t necessarily require omnibus first 13

Correction Adjust {p-values, alpha} to compensate for multiple testing post-hoc Bonferroni (most conservative) Assume all possible pairs: m = k(k-1)/m (comb.) alpha c = alpha / m Once you have looked, implication is you did all the comparisons implicitly! Holm-Bonferroni is less conservative Stepwise adjusting alpha as you go Dunnett for specifically all vs. control, others 14

Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 15

Non-parametrics: MWU and K-W Good for non-normal data, likert data (ordinal, not actually numeric) Assumptions: independent, at least ordinal Null: P(X > Y) = P(Y > X) where X,Y are observations from the 2 distributions (MWU) If assume same distribution shape, continuous then this can can be seen as comparing medians 16

MWU and K-W continued Essentially: rank order all data (both conditions) Total ranks for condition 1, compare to expected Various procecures to correct for ties 17

Bootstrap Resampling technique(s) Intuition: Create null distribution by e.g. subtracting means so ma = mb = 0 Now you have shifted samples A-hat and B-hat Combine these to make a null distribution Draw sample of size N, with replacement Do it 1000 (or 10k) times Use this to determine critical value (alpha = 0.05) Compare this critical value to your real data for test 18

Paired samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 Paired T-test Wilcoxon signed-rank 2+ 2-way ANOVA w/ subject random factor Mixed models (later) Friedman 19

Paired T-test Two samples per participant item Test subtracts them Then uses one-sample T-test with H0: m = 0 and H1: m!= 0 Regular T-test assumptions, plus: does subtraction make sense here? 20

Wilcoxon S.R. / Friedman H0: difference btwn pairs is symmetric around 0 H1: or not Excludes no-change items Essentially: rank by abs. difference; compare signs * ranks (Friedman = 3+ generalization) 21

One numeric IV, numeric DV SIMPLE LINEAR REGRESSION 22

Simple linear regression E(Y x) = b 0 + b 1 x looks at populations Population mean at this value of x Key H0: b 1!= 0 b 0 usually not important for significance (obv. important in model fit) b 1 : slope à change in Y per unit X Best fit: Least squares, or maximum likelihood LSq: minimize sum of squares of residuals ML: max prob. of seeing this data with this model 23

Assumptions, caveats Assumes: linearity in Y ~ X normally distributed error for each x, with constant variance at all x Error measuring X is small compared to var. Y (fixed X) Independent errors! Serial correlation, data that is grouped, etc. (later) Don t interpret widely outside available x vals Can transform for linearity! Log(Y), sqrt(y), 1/y, y^2 24

Assumption/residual checking Before: Use scatterplot for plausible linearity After: residual vs. fit Residual on Y vs. predicted on X Should be relatively even distributed around 0 (linear) Should have relatively even v. spread (eq. var) After: quantile-normal of residuals 25

Model interpretation Interpret b1, interpret the p-value CI: if it crosses 0, it s not significant R 2 : fraction of total variation accounted for Intutively: explained variance / total variance Explained = var(y) residual errors F 2 = R 2 / (1 R R 2 ); SML: 0.02, 0.15, 0.35 (cohen) 26

Robustness Brittle to linearity, independent errors Somewhat brittle to fixed-x Fairly robust to equal variance Quite robust to normality 27

CATEGORICAL OUTCOMES 28

One Cat. IV, Cat. DV, independent Contingency tables: how many people in each combination of categories 29

Chi-square test of independence H0: distribution of Var1 is the same at every level of Var2 (and vice versa) Null dist. Approaches X^2 when sample size grows Heuristic: no cells < 5 Can use FET instead Intuition: Sum over rows/columns: (observed expected)^2 / expected Expected: marginal % * count in other margin 30

Paired 2x2 tables Use McNemar s test Contigency table: matches and mismatches for each option. H0: marginals are the same Cond1: Yes Cond 1: No Cond2: Yes a b a + b Cond2: No c d c + d a + c b + d N Essentially a X^2 test on the agreement Test stat: (b-c)^2 / (b+c) 31

Paired, continued Cochran s Q: extended for more than two conditions Other similar extensions for related tasks 32

Critiques Choose a paper that has one (or more) empirical experiments as a central contribution Doesn t have to be human subjects, but can be Does have to have enough description of experiment 10-12 minute presentation Briefly: research questions, necessary background Main: describe and critique methods Experimental design, data collection, analysis Good, bad, ugly, missing Briefly, results? 33

Logistic regression (logit) Numeric IV, binary DV (or ordinal) log( E(Y)/ (1-E(Y)) ) == log ( Pr (Y=1) / Pr (Y=0)) = b 0 + b 1 x Log odds of success = linear function Odds: 0 to inf., 1 is the middle e.g.: odds = 5 = 5:1 for five successes, one fail Log odds: -inf to inf w/ 0 in the middle: good for regression Modeled as binomial distribution 34

Interpreting logistic regression Take exp(coef) to get interpretable odds. For each unit increase in x, odds increase b 1 times Note that this can make small coefs important! Use e.g., Homer-Lemeshow test for goodness of fit null == data fit the model But not a lot of power! 35

MULTIVARIATE 36

Multiple regression Linear/logistic regression with more variables! At least one numeric, 0+ categorical Still: fixed x, normal errors w/ equal variance, independent errors (linear) Linear relationship in E(Y) and one x, when other inputs held constant Effects of each x are independent! Still check q-n of residuals, residual vs. fit 37

Model selection Which covariates to keep? (more on this in a bit) 38

Adding categorical vars Indicator variables (everything is 0 or 1) Need one fewer indicator than conditions One condition is true; or none are true (baseline) Coefs are *relative to baseline*! Model selection: keep all or none for one factor Called ANCOVA when at least one each numeric + categorical 39

Interaction What if your covariates *aren t* independent? E(Y) = b0 + b 1 x 1 + b 2 x 2 + b 12 x 1 x 2 Slope for x1 is diff. for each value of x2 Superadditive: all in same direction, interaction makes effects stronger Subadditive: interaction is in opposite direction For indicator vars, all or none 40

Model selection! Which covariates to keep? From theory Keep interaction only if it s significant? If keep interaction, should keep corresponding mains Adjusted R^2? Regular R^2 always higher w/ more covars BIC and AIC Take model likelihood and penalize for more params Abs value not interpretable; lower is better All combinations? Stepwise? 41

Know they exist; look them up if relevant THINGS WE ARE ONLY GOING TO MENTION BRIEFLY 42

Multi-way ANOVA >1 cat IVs, 1 numeric DV Normality, equal variance, indep. Errors With interaction: every combo of factor levels has its own population mean Without interaction (additive): change in one var consistent as all fixed vals for others Works basically like standard ANOVA, etc. 43

Mixed models regression Explicitly model correlations in data Fixed effects: affect outcome for everyone Random effects: deviations per data item, don t want to model individually Simplest example: repeated measures Y ~ b0 + b1x1 + b2x2. + random ID intercept Each participant has their own intercept adjustment 44

POWER ANALYSIS 45

What is power? Null distribution: designed so that we d only see a test statistic this extreme 5% of the time This bounds type I but not type II Power = 1 type II error rate Heuristic: 80% is good enough 46

Alternative scenarios One null, but infinitely many alternatives! Alternative distribution: given some n, underlying variance, underlying diff. in pop. means, what is the distribution of test statistic You know the critical value, so tells you how often your p will be above 0.05 when the true scenario is as you model 47

Calculating power A priori, to think about sample size and judge value of experiment Inherently requires estimating the alternative scenario! Maybe try a few Statistic-specific, but in general: Sample size, effect size, power, alpha Consider the smallest effect size that you consider interesting and try to achieve reasonable power for that effect size 48

Example from Seltman book F statistic (ANOVA) 3 treatments 50 people each Red: sigma = 10, means: 10, 12, 14 Blue: sigma = 10, means: 10, 13, 16 49

Promoting power (Review from earlier) Raise sample size; reduce variance; aim for bigger effects 50

Walkthrough: linear regression u = model df -> number of params v = F-test error df -> N u 1 f 2 = r 2 / (1 r 2 ) r 2 = f 2 / (1 + f 2 ) 51

Retrospective power Somewhat controversial Calculate observed effect size, then determine what sample size would be needed Whole new experiment, not just collect more Not a good idea: We didn t find a significant effect, but if we had studied 12 more people 52