Bayesian Information Criterion as a Practical Alternative to Null-Hypothesis Testing Michael E. J. Masson University of Victoria

Similar documents
Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Inequality constrained hypotheses for ANOVA

The problem of base rates

Two-Sample Inferential Statistics

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract

Bayesian Concept Learning

Will Penny. SPM short course for M/EEG, London 2015

10/31/2012. One-Way ANOVA F-test

Basic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015)

A Power Fallacy. 1 University of Amsterdam. 2 University of California Irvine. 3 University of Missouri. 4 University of Groningen

Another Statistical Paradox

Difference in two or more average scores in different groups

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Will Penny. SPM short course for M/EEG, London 2013

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Will Penny. SPM for MEG/EEG, 15th May 2012

16.3 One-Way ANOVA: The Procedure

Density Estimation: ML, MAP, Bayesian estimation

10.4 Hypothesis Testing: Two Independent Samples Proportion

Will Penny. DCM short course, Paris 2012

Frequentist Statistics and Hypothesis Testing Spring

PATTERN RECOGNITION AND MACHINE LEARNING

A simple two-sample Bayesian t-test for hypothesis testing

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Bayesian Statistics Adrian Raftery and Jeff Gill One-day course for the American Sociological Association August 15, 2002

7. Estimation and hypothesis testing. Objective. Recommended reading

Introduction to the Analysis of Variance (ANOVA)

HYPOTHESIS TESTING. Hypothesis Testing

Bioinformatics: Network Analysis

Bayesian Inference for Normal Mean

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

Running head: REPARAMETERIZATION OF ORDER CONSTRAINTS 1. Adjusted Priors for Bayes Factors Involving Reparameterized Order Constraints. Daniel W.

Structure learning in human causal induction

12/2/15. G Perception. Bayesian Decision Theory. Laurence T. Maloney. Perceptual Tasks. Testing hypotheses. Estimation

Fundamentals of Computational Neuroscience 2e

Methodological workshop How to get it right: why you should think twice before planning your next study. Part 1

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

8/23/2018. One-Way ANOVA F-test. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y.

Bayesian data analysis using JASP

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

The P rep statistic as a measure of confidence in model fitting

Psychology 282 Lecture #4 Outline Inferences in SLR

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Bayesian Statistics: An Introduction

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

SRNDNA Model Fitting in RL Workshop

HOW TO STATISTICALLY SHOW THE ABSENCE OF AN EFFECT. Etienne QUERTEMONT[1] University of Liège

Sensitiveness analysis: Sample sizes for t-tests for paired samples

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Revealing inductive biases through iterated learning

Sampling Distributions: Central Limit Theorem

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

Which model to use? How can we deal with these decisions automatically? Note flailing in data gaps and beyond ends for high M

Choosing among models

Intelligent Systems I

Bayes Rule: A Tutorial Introduction ( www45w9klk)

Divergence Based priors for the problem of hypothesis testing

Bayesian Updating: Discrete Priors: Spring

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

CSE 473: Artificial Intelligence Autumn Topics

What p values really mean (and why I should care) Francis C. Dane, PhD

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Model Evaluation and Selection in Cognitive Computational Modeling

Why I think Bayesian Analysis is the greatest thing since NHST (bread)

Applied Statistics for the Behavioral Sciences

Statistical Methods in Particle Physics

Sociology 6Z03 Review II

Algebraic Geometry and Model Selection

Uniformly and Restricted Most Powerful Bayesian Tests

Bayesian Regression (1/31/13)

One-way ANOVA. Experimental Design. One-way ANOVA

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Minimum Message Length Analysis of the Behrens Fisher Problem

Comparison of Bayesian and Frequentist Inference

Human-Centered Fidelity Metrics for Virtual Environment Simulations

Power Analysis using GPower 3.1

Understanding Aha! Moments

Model Estimation Example

Parameter Learning With Binary Variables

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Because of the special form of an alternating series, there is an simple way to determine that many such series converge:

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

1 Hypothesis Testing and Model Selection

Transcription:

Bayesian Information Criterion as a Practical Alternative to Null-Hypothesis Testing Michael E. J. Masson University of Victoria Presented at the annual meeting of the Canadian Society for Brain, Behaviour, and Cognitive Science, Halifax, NS, June 2010. 1

Outline Inspired by Wagenmakers (2007) What's wrong with null hypothesis testing and its p values? A Bayesian alternative Practical application and implications of the alternative 2

What's Wrong with p Values? Null hypothesis significance testing (NHST) provides the probability of the observed outcome (or one that is even more extreme) p(data H 0 ) but what we really want is p(h Data) there is even a common misconception that NHST p values actually correspond closely to p(h Data) 3

What's Wrong with p Values? The plague of null effects under NHST, the null hypothesis cannot be accepted even when data favoring the null hypothesis constitute a theoretically interesting outcome, NHST does not allow researchers to make effective use of such a result 4

A Bayesian Alternative Rather than an emphasis on rejecting the null hypothesis, a model selection approach is preferred null and alternative hypotheses are characterized as opposing models (Dixon, 2003; Glover & Dixon, 2004) Bayesian approach evaluates the extent to which the data support the null model vs. the alternative model 5

A Bayesian Alternative Bayes' theorem p(h l D) = p(d l H)" p(h) p(d) posterior probability of Hypothesis given Data 6

A Bayesian Alternative Bayes' theorem p(h l D) = p(d l H)" p(h) p(d) Define relative posterior probabilities of null and alternative hypotheses (odds) with this formulation p(h 0 l D) p(h 1 l D) = p(d l H 0 )" p(h 0 ) p(d) p(d l H 1 )" p(h 1 ) p(d) p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) 7

A Bayesian Alternative posterior odds p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) Bayes factor prior odds Bayes factor reflects change in prior odds based on new data strength of evidence for H 0 relative to H 1 if equal priors are assumed [p(h 0 ) = p(h 1 )], then posterior odds are equal to the Bayes factor 8

A Bayesian Alternative p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) Complexity in computing p(d H 1 ) integrate across all possible values of effect size Approximation to Bayes factor based on the Bayesian Information Criterion (BIC) BIC(H i ) = 2 ln(l i ) + k i ln(n) where L i = maximum likelihood for model H i, k i = number of free parameters in model H i, n = number of observations 9

A Bayesian Alternative p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) Estimate Bayes factor for comparing H 0 and H 1 BF 01 " e #BIC 10 / 2 where ΔBIC 10 = BIC(H 1 ) BIC(H 0 ) BF 01 estimates posterior probabilities assuming equal priors p BIC (H 0 l D) = BF 01 1+ BF 01 p BIC (H 1 l D) = 1 p BIC (H 0 l D) 10

Practical Application BIC computed from components of standard ANOVA BIC(H i ) = 2 ln(l i ) + k i ln(n) With normally distributed errors of measurement BIC(H i ) = n ln(1 R i2 ) + k i ln(n) where 1 R i 2 is the proportion of variability that model H i fails to explain, n = number of subjects For ANOVA: 1 R i 2 = SSE i /SS total When computing BIC(H 1 ) BIC(H 0 ), the SS total term common to both cancels out, producing 11

Practical Application ΔBIC 10 = BIC(H 1 ) BIC(H 0 ) # = n " ln SSE & 1 % ( + (k 1 ) k 0 )" ln(n) $ SSE 0 ' Application to real data (Breuer et al., 2009) perceptual identification of objects previously seen as nontargets in an RSVP search task at test: original form, mirror image, new items 12

Practical Application Source SS df MS F p Subjects.668 39 Item.357 2.178 12.90.0001 Item x Ss 1.078 78.014 Total 2.103 119 $ "BIC 10 = n # ln SSE ' 1 & ) + (k 1 * k 0 )# ln(n) % SSE 0 ( # = 40 " ln 1.078 & % ( + 1" ln(40) = 7.75 $ 1.435' BF 01 " e #BIC 10 / 2 = e $7.75/ 2 = 0.0208 p BIC (H 0 l D) = BF 01 1+ BF 01 =.0208 1+.0208 =.020 p BIC(H 1 D) =.980 Proportion Correct Note: in the above computation of ΔBIC 10, 1.435 =.357+1.078 0.8 0.7 0.6 0.5 0.4 95% CI Orig. Mirr. New 13

Practical Application Interpretation of p BIC values (Raftery, 1995) p BIC (H i D) Evidence.50 -.75 weak.75 -.95 positive.95 -.99 strong >.99 very strong 14

Practical Application The plague of null effects revisited BIC offers a way of evaluating the degree to which evidence favors H 0 over H 1 Kantner & Lindsay (2010) seek evidence that subjects can learn to improve recognition memory accuracy through feedback during test phase yes/no responses on recognition test followed by valid feedback null effects on recognition accuracy in 6 experiments 15

Practical Application Exp. n SS FxE SS error F BF 01 p BIC (H 0 D) 1 46 0.052 2.003 1.14 3.76.790 2a 17 0.008 1.501 < 1 3.94.798 2b 18 0.043 1.290 1.14 3.16.760 3 43 0.013 1.840 < 1 5.64.849 4a 46 0.084 1.410 2.55 1.79.642 4b 44 0.003 0.667 < 1 6.01.857 Bayes factor from each experiment can be combined multiplicatively to produce an aggregate result BF 01 (total) = 3.76(3.94)(3.16)(5.64)(1.79)(6.01) = 2840.4 p BIC (H 0 D) =.9996 (very strong evidence) 16

Implications of the Bayesian Alternative One situation in which NHST p values diverge from BIC posterior probability Probability " p 2 p BIC (H 0 D) NHST p value " the NHST procedure is oblivious to the very real possibility that although the data may be unlikely under H 0, they are even less likely under H 1." (Wagenmakers, 2007) modeled on within-subjects ANOVA with 2 conditions 17

Conclusion Bayesian approach resolves various problems with p values under the NHST system it provides what researchers want: p(h D) effective evaluation of validity of the null hypothesis Easy to apply in practice (ANOVA-generated info) Need to develop new standards of evidence by exploring what Bayesian analysis produces for a wide range of data configurations 18

References Breuer, A. T., & Masson, M. E. J., Cohen, A.-L., & Lindsay, D. S. (2009). Long-term repetition priming of briefly identified objects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 487-498. Dixon, P. (2003). The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology, 57, 189-202. Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11, 791-806. Kantner, J., & Lindsay, D. S. (2010). Can corrective feedback improve recognition memory? Memory & Cognition, 38, 389-406. Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.), Sociological methodology 1995 (pp. 111-196). Cambridge, MA: Blackwell. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779-804. 19