Bayesian Information Criterion as a Practical Alternative to Null-Hypothesis Testing Michael E. J. Masson University of Victoria Presented at the annual meeting of the Canadian Society for Brain, Behaviour, and Cognitive Science, Halifax, NS, June 2010. 1
Outline Inspired by Wagenmakers (2007) What's wrong with null hypothesis testing and its p values? A Bayesian alternative Practical application and implications of the alternative 2
What's Wrong with p Values? Null hypothesis significance testing (NHST) provides the probability of the observed outcome (or one that is even more extreme) p(data H 0 ) but what we really want is p(h Data) there is even a common misconception that NHST p values actually correspond closely to p(h Data) 3
What's Wrong with p Values? The plague of null effects under NHST, the null hypothesis cannot be accepted even when data favoring the null hypothesis constitute a theoretically interesting outcome, NHST does not allow researchers to make effective use of such a result 4
A Bayesian Alternative Rather than an emphasis on rejecting the null hypothesis, a model selection approach is preferred null and alternative hypotheses are characterized as opposing models (Dixon, 2003; Glover & Dixon, 2004) Bayesian approach evaluates the extent to which the data support the null model vs. the alternative model 5
A Bayesian Alternative Bayes' theorem p(h l D) = p(d l H)" p(h) p(d) posterior probability of Hypothesis given Data 6
A Bayesian Alternative Bayes' theorem p(h l D) = p(d l H)" p(h) p(d) Define relative posterior probabilities of null and alternative hypotheses (odds) with this formulation p(h 0 l D) p(h 1 l D) = p(d l H 0 )" p(h 0 ) p(d) p(d l H 1 )" p(h 1 ) p(d) p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) 7
A Bayesian Alternative posterior odds p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) Bayes factor prior odds Bayes factor reflects change in prior odds based on new data strength of evidence for H 0 relative to H 1 if equal priors are assumed [p(h 0 ) = p(h 1 )], then posterior odds are equal to the Bayes factor 8
A Bayesian Alternative p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) Complexity in computing p(d H 1 ) integrate across all possible values of effect size Approximation to Bayes factor based on the Bayesian Information Criterion (BIC) BIC(H i ) = 2 ln(l i ) + k i ln(n) where L i = maximum likelihood for model H i, k i = number of free parameters in model H i, n = number of observations 9
A Bayesian Alternative p(h 0 l D) p(h 1 l D) = p(d l H 0) p(d l H 1 ) " p(h 0) p(h 1 ) Estimate Bayes factor for comparing H 0 and H 1 BF 01 " e #BIC 10 / 2 where ΔBIC 10 = BIC(H 1 ) BIC(H 0 ) BF 01 estimates posterior probabilities assuming equal priors p BIC (H 0 l D) = BF 01 1+ BF 01 p BIC (H 1 l D) = 1 p BIC (H 0 l D) 10
Practical Application BIC computed from components of standard ANOVA BIC(H i ) = 2 ln(l i ) + k i ln(n) With normally distributed errors of measurement BIC(H i ) = n ln(1 R i2 ) + k i ln(n) where 1 R i 2 is the proportion of variability that model H i fails to explain, n = number of subjects For ANOVA: 1 R i 2 = SSE i /SS total When computing BIC(H 1 ) BIC(H 0 ), the SS total term common to both cancels out, producing 11
Practical Application ΔBIC 10 = BIC(H 1 ) BIC(H 0 ) # = n " ln SSE & 1 % ( + (k 1 ) k 0 )" ln(n) $ SSE 0 ' Application to real data (Breuer et al., 2009) perceptual identification of objects previously seen as nontargets in an RSVP search task at test: original form, mirror image, new items 12
Practical Application Source SS df MS F p Subjects.668 39 Item.357 2.178 12.90.0001 Item x Ss 1.078 78.014 Total 2.103 119 $ "BIC 10 = n # ln SSE ' 1 & ) + (k 1 * k 0 )# ln(n) % SSE 0 ( # = 40 " ln 1.078 & % ( + 1" ln(40) = 7.75 $ 1.435' BF 01 " e #BIC 10 / 2 = e $7.75/ 2 = 0.0208 p BIC (H 0 l D) = BF 01 1+ BF 01 =.0208 1+.0208 =.020 p BIC(H 1 D) =.980 Proportion Correct Note: in the above computation of ΔBIC 10, 1.435 =.357+1.078 0.8 0.7 0.6 0.5 0.4 95% CI Orig. Mirr. New 13
Practical Application Interpretation of p BIC values (Raftery, 1995) p BIC (H i D) Evidence.50 -.75 weak.75 -.95 positive.95 -.99 strong >.99 very strong 14
Practical Application The plague of null effects revisited BIC offers a way of evaluating the degree to which evidence favors H 0 over H 1 Kantner & Lindsay (2010) seek evidence that subjects can learn to improve recognition memory accuracy through feedback during test phase yes/no responses on recognition test followed by valid feedback null effects on recognition accuracy in 6 experiments 15
Practical Application Exp. n SS FxE SS error F BF 01 p BIC (H 0 D) 1 46 0.052 2.003 1.14 3.76.790 2a 17 0.008 1.501 < 1 3.94.798 2b 18 0.043 1.290 1.14 3.16.760 3 43 0.013 1.840 < 1 5.64.849 4a 46 0.084 1.410 2.55 1.79.642 4b 44 0.003 0.667 < 1 6.01.857 Bayes factor from each experiment can be combined multiplicatively to produce an aggregate result BF 01 (total) = 3.76(3.94)(3.16)(5.64)(1.79)(6.01) = 2840.4 p BIC (H 0 D) =.9996 (very strong evidence) 16
Implications of the Bayesian Alternative One situation in which NHST p values diverge from BIC posterior probability Probability " p 2 p BIC (H 0 D) NHST p value " the NHST procedure is oblivious to the very real possibility that although the data may be unlikely under H 0, they are even less likely under H 1." (Wagenmakers, 2007) modeled on within-subjects ANOVA with 2 conditions 17
Conclusion Bayesian approach resolves various problems with p values under the NHST system it provides what researchers want: p(h D) effective evaluation of validity of the null hypothesis Easy to apply in practice (ANOVA-generated info) Need to develop new standards of evidence by exploring what Bayesian analysis produces for a wide range of data configurations 18
References Breuer, A. T., & Masson, M. E. J., Cohen, A.-L., & Lindsay, D. S. (2009). Long-term repetition priming of briefly identified objects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 487-498. Dixon, P. (2003). The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology, 57, 189-202. Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11, 791-806. Kantner, J., & Lindsay, D. S. (2010). Can corrective feedback improve recognition memory? Memory & Cognition, 38, 389-406. Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.), Sociological methodology 1995 (pp. 111-196). Cambridge, MA: Blackwell. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779-804. 19