What p values really mean (and why I should care) Francis C. Dane, PhD

Similar documents
Two-Sample Inferential Statistics

HYPOTHESIS TESTING. Hypothesis Testing

Uni- and Bivariate Power

Human-Centered Fidelity Metrics for Virtual Environment Simulations

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Psychology 282 Lecture #4 Outline Inferences in SLR

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

POLI 443 Applied Political Research

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Inferential statistics

Harvard University. Rigorous Research in Engineering Education

THE ROLE OF COMPUTER BASED TECHNOLOGY IN DEVELOPING UNDERSTANDING OF THE CONCEPT OF SAMPLING DISTRIBUTION

16.400/453J Human Factors Engineering. Design of Experiments II

Types of Statistical Tests DR. MIKE MARRAPODI

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

The Purpose of Hypothesis Testing

POLI 443 Applied Political Research

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

ScienceDirect. Who s afraid of the effect size?

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Week 12 Hypothesis Testing, Part II Comparing Two Populations

r(equivalent): A Simple Effect Size Indicator

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Statistical Inference

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Chapter 24. Comparing Means

Data Mining. CS57300 Purdue University. March 22, 2018

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Sampling Distributions

Inferences for Correlation

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Power Analysis. Introduction to Power

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Hypothesis Testing One Sample Tests

Chi Square Analysis M&M Statistics. Name Period Date

How do we compare the relative performance among competing models?

Model Estimation Example

Lab #12: Exam 3 Review Key

MS&E 226: Small Data

CH.9 Tests of Hypotheses for a Single Sample

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

COGS 14B: INTRODUCTION TO STATISTICAL ANALYSIS

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Inferences for Regression

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

Advanced Experimental Design

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

ANOVA TESTING 4STEPS. 1. State the hypothesis. : H 0 : µ 1 =

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Section 6.2 Hypothesis Testing

James H. Steiger. Department of Psychology and Human Development Vanderbilt University. Introduction Factors Influencing Power

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Understanding p Values

Multiple samples: Modeling and ANOVA

Mathematical Notation Math Introduction to Applied Statistics

The problem of base rates

DISTRIBUTIONS USED IN STATISTICAL WORK

The t-statistic. Student s t Test

An inferential procedure to use sample data to understand a population Procedures

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Data Analysis and Statistical Methods Statistics 651

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Workshop Research Methods and Statistical Analysis

Introduction to Statistical Inference

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Sampling Distributions: Central Limit Theorem

Assumptions of classical multiple regression model

Using SPSS for One Way Analysis of Variance

ANOVA - analysis of variance - used to compare the means of several populations.

Statistics for Managers Using Microsoft Excel

T-TEST FOR HYPOTHESIS ABOUT

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Relating Graph to Matlab

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Inference for Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Independent Samples ANOVA

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Lecture (chapter 10): Hypothesis testing III: The analysis of variance

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Transcription:

What p values really mean (and why I should care) Francis C. Dane, PhD

Session Objectives Understand the statistical decision process Appreciate the limitations of interpreting p values Value the use of statistics regarding effect size

Inferential Statistics Test of fit between data and model Usually, model = null hypothesis There is no difference There is no relationship It s all measurement error and individual differences Usually, we hope to reject null hypothesis But sometimes we want to accept it

Statistical Decision Making Unknowable State of the World We (almost) never get to measure the entire population State of the World Wrong

Statistical Decision Making Decision about Fit between Data and Model We decide whether or not our data fit the chosen model Truth Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) (1-α) Error Type I α (aka p value) Wrong Error Type II β (1-β) power

Statistical Decision Making Four Possible Decision Outcomes Two ways to be correct, and two to be wrong State of the World Wrong Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) (1-α) Error Type I α (aka p value) Error Type II β (1-β) power

Statistical Decision Making We might incorrectly reject the model Observe an effect in what is really random variation State of the World Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) (1-α) Type I Error α p value Wrong Type II Error β (1-β) Power

Statistical Decision Making We might incorrectly accept the model Miss an effect hidden within random variation State of the World Wrong Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) (1-α) Type I Error α p value Type II Error β Decision (1-β) power

Statistical Decision Making We might correctly reject the model Observe an effect that exists State of the World Wrong Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) (1-α) Type I Error α p value Type II Error β (1-β) Power power

Statistical Decision Making We might correctly accept the model Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) State of the World (1-α) Type I Error α p value Wrong Type II Error β (1-β) Power

Statistical Decision Making State of the World Wrong Decision about Fit Accept Model (Data fit) Reject Model (Data do not fit) (1-α) Type I Error α p value Type II Error β (1-β) Power

Sample Size Dependence Both α and β are inversely dependent upon sample size Larger samples less error Larger samples more power Function of Standard Error S x = s n s = sample standard deviation n = sample size

5 55 105 155 205 255 305 355 405 455 505 555 605 655 705 755 805 855 905 955 Standard Error of Mean (s 10) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Sample Size

Degrees of Freedom All model-fitting statistics are based on standard error Or an approximation (nonparametric statistics) Usually represented by degrees of freedom Number of estimable data points before last data points are determined by the model Student s t as example df = n-1 (per group)

4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 p Value for t as a Function of Degrees of Freedom 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 p(t = 1.0) p(t = 1.5) p(t = 2.0) p(t = 2.5) p(t = 3.0) p(t = 3.5)

Limitations of p <.05 (or <.0001) Smaller p value does NOT mean larger effect It is an indication of lack of fit between model and data p value is not relevant to the truth of the conclusion p <.05 (or any other number) is not a magic limen significance should not be used as sole basis for a conclusion, policy, treatment, etc. p <.05 is basis for further consideration

Effect Size Measures how much data deviate from model How large is the mean difference? How strong is the relationship between variables? How well does X predict Y? How much variance is accounted for?

8.0 8.3 8.6 8.9 9.2 9.5 9.8 10.1 10.4 10.7 11.0 11.3 11.6 11.9 12.2 12.5 12.8 13.1 13.4 13.7 14.0 8.0 8.3 8.6 8.9 9.2 9.5 9.8 10.1 10.4 10.7 11.0 11.3 11.6 11.9 12.2 12.5 12.8 13.1 13.4 13.7 14.0 Which is the Larger Effect? (Both are significant ) 0.45 0.45 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 Mean = 10 Mean = 11 Mean = 10 Mean = 13

Measures of Effect Size Mean Differences Cohen s d Partial η 2 Relationships r 2 or R 2 β for regression OR, RR, etc.

Summary p value is probability of incorrectly rejecting the notion that the data fit the selected model p value is not an indication of the truth or validity of conclusions p value is an inadequate standard for deciding on importance Despite common usage, there is no mathematically justified p value for significant Effect size is an important part of interpreting results

Additional Reading Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2 ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Dane, F. C. (2011). Evaluating Research: Methodology for People Who Need to Read Research. Thousand Oaks, CA: Sage Publications. Trafimow, D., & Marks, M. (2015). Editorial. Basic & Applied Social Psychology, 37(1), 1-2. doi: 10.1080/01973533.2015.1012991 Wasserstein, R. L. & Lazar, N. A. (2016). The ASA's statement on p- values: Context, process, and purpose. The American Statistician. DOI: 10.1080/00031305.2016.1154108 Weisberg, H. I., Carver, R. P., Chachkin, N. J., & Cohen, D. K. (1979). Correspondence. Harvard Educational Review, 49, 280-283. https://en.wikipedia.org/wiki/effect_size

Questions? fcdane@jchs.edu 540-224-4515 (office) 540-588-9404 (mobile)