Why I think Bayesian Analysis is the greatest thing since NHST (bread)

Size: px

Start display at page:

Download "Why I think Bayesian Analysis is the greatest thing since NHST (bread)"

Osborn Brooks
5 years ago
Views:

1 Why I think Bayesian Analysis is the greatest thing since NHST (bread) Data Binge Nate Powell May 12, 2017 NHST = Null Hypothesis Significance TesIng

2 Textbook Doing Bayesian Data Analysis by John K. Kruschke Memory and the ComputaIonal Brain by Gallistel and King

3 What is Bayesian Analysis? hrp://xkcd.com/1132/

4 Advantages of Bayesian StaIsIcs vs. FrequenIst StaIsIcs Degree of belief is graded (not binary) Bayesian Update (incorporates new data) Can provide reasonable answers given limited data Can describe the probability of events that have not yet occurred

5 Advantages of Bayesian StaIsIcs vs. FrequenIst StaIsIcs Degree of belief is graded (not binary) what if your {X} test returns p =.06? or p =.055? a bayesian analysis will reflect this as a small change in posterior probability, not a complete shib between true and false. Bayesian Update (incorporates new data) Can provide reasonable answers given limited data Can describe the probability of events that have not yet occurred

6 Advantages of Bayesian StaIsIcs vs. FrequenIst StaIsIcs Degree of belief is graded (not binary) Bayesian Update (incorporates new data) if your p-value is 0.06, are you allowed to add new samples? A posterior distribuion can always be the prior for a bayesian update step if more informaion is collected. Can provide reasonable answers given limited data Can describe the probability of events that have not yet occurred

7 Advantages of Bayesian StaIsIcs vs. FrequenIst StaIsIcs Degree of belief is graded (not binary) Bayesian Update (incorporates new data) Can provide reasonable answers given limited data Limited data (which may not be significant by most tests) can be enough to modify or help confirm an untrusted prior, but won t disturb a trusted one... BUT HERE BE DRAGONS! Can describe the probability of events that have not yet occurred

8 Advantages of Bayesian StaIsIcs vs. FrequenIst StaIsIcs Degree of belief is graded (not binary) Bayesian Update (incorporates new data) Can provide reasonable answers given limited data Can describe the probability of events that have not yet occurred Bayesian methods were used to assess the likelihood of nuclear disasters for US policy (prior to any having occurred) with impressive accuracy.

9 Advantages of Bayesian StaIsIcs vs. Terminology, Conditional Probability FrequenIst StaIsIcs By the way, who s afraid of math? Who s afraid of equations? FrequenIst concern: P(D H) D = Data H = Hypothesis Bayesian Concern: P(ɵ D) or P(M D,ɵ) D = Data ɵ = model parameters M = Model hrp://xkcd.com/892/

10 Bayesian Update Learning from New experience to update previous beliefs P(M D) = P(D M) P(M) / P(D) Posterior Liklihood Prior Evidence This also allows us to incorporate new data without having to change any assumpions hrp://xkcd.com/1236/

11 Bayesian Update : twins example A physicist is having Twins, and wants to know the odds that her twins will be idenical (vs. fraternal). The Dr. told her that 2/3 of twin births are fraternal, and 1/3 are idenical. Also, a sonogram confirmed her twins are both male. (All idenical twins are the same sex, Fraternal can be either the same or mixed.) Efron, B. (2013). A 250-year argument: Belief, behavior, and the bootstrap. Bulle)n of the American Mathema)cal Society, 50(1),

12 Bayesian Update : twins example Prior DistribuIon: P(M = idenical) = 1/3 P(M = fraternal) = 2/3 AddiIonal Evidence: D = Same Sex P(D) = 2/3* Likelihood: P(same sex M = idenical) = 1 P(same sex M = fraternal) = 1/2 Posterior: P(M = idenical same sex) = 1/2 P(M = fraternal same sex) = 1/2 Posterior Prior DistribuIon: Efron, B. (2013). A 250-year argument: Belief, behavior, and the bootstrap. Bulle)n of the American Mathema)cal Society, 50(1), Identical Twins are: Fraternal Sonogram shows: Same sex 1/3 Physicist a Different 1/3 0 c 1/3 1/3 2/3 0 P(fraternal) P(idenIcal) b d Doctor

13 Bayesian Update : twins example But if you prefer a visual confirmaion: Sonogram shows: 1 Prior Same sex Different a b Identical Twins are: Fraternal 1/3 0 c 1/3 1/3 d 1/3 2/3 Doctor 1 0 P(frat) P(id) Posterior Physicist 0 P(frat) P(id) Efron, B. (2013). A 250-year argument: Belief, behavior, and the bootstrap. Bulle)n of the American Mathema)cal Society, 50(1),

14 Bayesian probability vs. FrequenIst probability FrequenIst Probability: The frequency with which events happen in a large number of trials Bayesian Probability: The Degree of belief you have that a certain event will happen (on any given trial) hrp://oikosjournal.wordpress.com/2011/10/11/frequenist-vs-bayesian-staisics-resources-to-helpyou-choose/

15 so, what, if anything, is wrong with NHST? Suppose we flip a coin N = 26 Imes and we come up with z = 8 heads is this coin biased or fair? Note, a bayesian would ask what is the probability that the coin is fair...

16 NHST N = 26 z = 8 is this coin biased? if the coin were fair we would expect N = 26 z = 13 The quesion we must answer is can we reject the null hypothesis, in this case, that the coin is fair and we saw these results by chance alone... If we repeated this experiment many Imes, would we expect to see z = 8 more than 5% of the Ime (with a fair coin?)

17 NHST The probability of seeing z heads in N flips is given by the binomial probability distribuion: where: If we repeated this experiment many Imes, would we expect to see z = 8 more than 5% of the Ime (with a fair coin?)

18 NHST If we repeated this experiment many Imes, would we expect to see z = 8 more than 5% of the Ime (with a fair coin?) The probability of seeing z heads in N flips is given by the binomial probability distribuion: Hypothetical Population Implied Sampling Distribution p(y) = 0.5 N = 26 p(z N,) tail y head z

19 NHST Hypothetical Population Implied Sampling Distribution p(y) tail y = 0.5 head N = 26 p(z N,) We want to know the probability of finding a result as low (or lower) than 8 heads out of 26 flips... This is the p-value of z = 8 for this sampling distribuion! z

20 NHST Hypothetical Population Implied Sampling Distribution p(y) tail y = 0.5 head N = 26 p(z N,) In this case we will consider the two-sided p-value (shaded) we can reject the null if z 7 or z 19 Thus we FAIL to reject the Null Hypothesis... z

21 NHST Thus we FAIL to reject the Null Hypothesis... What does that mean? Failing to reject the null hypothesis is like kissing your sister... we don t have an answer, but there s not a good reason to say the coin is biased.

22 NHST Thus we FAIL to reject the Null Hypothesis... But its more probable the coin is fair right? Do we know how likely the coin is to be fair? What if we had Really REALLY good reasons to believe it was fair?

23 NHST Thus we FAIL to reject the Null Hypothesis... But at least that s it right... I mean, it s an unsaisfying answer, but at least we know what it is... Unless...

NHST we used a sampling distribuion for N = 26 Hypothetical Population Implied Sampling Distribution p(y) tail y = 0.5 head N = 26 p(z N,) 0.00 0.04 0.

24 NHST we used a sampling distribuion for N = 26 Hypothetical Population Implied Sampling Distribution p(y) tail y = 0.5 head N = 26 p(z N,) What if instead they sampled unil they found z = 8? We would have to use a DIFFERENT sampling distribuion... z

25 NHST Sampled coin unil we got z = 8, it took N = 26 flips... which gives us: Hypothetical Population Implied Sampling Distribution p(y) = 0.5 z = 8 p(n z,) tail y head N

26 NHST Sampled coin unil we got z = 8, it took N = 26 flips... Hypothetical Population Implied Sampling Distribution p(y) tail y = 0.5 head z = 8 p(n z,) N In this case, for p <.05, our values are N 9 and N 26 Therefore we REJECT the Null hypothesis and conclude the coin is biased!

27 NHST Therefore we REJECT the Null hypothesis and conclude the coin is biased! Now, do we have an expression of how likely the coin is to be biased? Or fair? And again, what if we were REALLY REALLY sure that the coin was fair?

28 NHST But we have an even bigger problem here! Thus we FAIL to reject the Null Hypothesis... Therefore we REJECT the Null hypothesis and conclude the coin is biased! We both failed to reject and rejected the null hypothesis based on the same data... WHICH IS IT?

29 NHST We both failed to reject and rejected the null hypothesis based on the same data... WHICH IS IT? in order to know which interpretaion is correct, we need to know the intenions of the experimenter... WHAT?!?

30 Bayes The Bayesian interpretaion of this data is a bit simpler. The likelihood is: much simpler, this depends only on the number of flips and number of trials, no explicit assumpions about intenions...

31 Bayes Plus, for Bayesian Analyses we can consider prior beliefs: p() Prior beta( 11,11) p() Prior Strongish fair prior beta( 11,11) beta( 2,20) p() Prior beta( 2,20) p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 Posterior 95% HDI beta( 19,29) p(d)=2.93e 08 p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 Posterior beta( 19,29) biased beta( 10,38) prior 95% HDI 95% HDI Strong p(d)=2.93e 08 p(d)=8.11e 09 p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 95% HDI Posterior beta( 10,38) p(d)=8.11e 09 And in either case, we can say how probable the coin is to be fair (or any other value..)

32 Highest Density Interval (HDI) Of Course an HDI will not always be so simple as on a Normal distribution, and might not be Unimodal... Defined as the interval of possibility space that accounts for x (usually 95)% of the probability mass. 95% HDI x = % HDI x = 0.4

33 Bayes Plus, for Bayesian Analyses we can consider prior beliefs: p() Prior beta( 11,11) p() Prior Strongish fair prior beta( 11,11) beta( 2,20) p() Prior beta( 2,20) p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 Posterior 95% HDI beta( 19,29) p(d)=2.93e 08 p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 Posterior beta( 19,29) biased beta( 10,38) prior 95% HDI 95% HDI Strong p(d)=2.93e 08 p(d)=8.11e 09 p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 95% HDI Posterior beta( 10,38) p(d)=8.11e 09 And in either case, we can say how probable the coin is to be fair (or any other value..)

34 Here be Dragons: Dangers of Bayesian StaIsIcs The Prior DistribuIon...

35 Bayes Plus, for Bayesian Analyses we can consider prior beliefs: p() Prior beta( 11,11) p() Prior Strongish fair prior beta( 11,11) beta( 2,20) p() Prior beta( 2,20) p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 Posterior 95% HDI beta( 19,29) p(d)=2.93e 08 p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 Posterior beta( 19,29) biased beta( 10,38) prior 95% HDI 95% HDI Strong p(d)=2.93e 08 p(d)=8.11e 09 p(d ) 0.0e e e e 07 p( D) Likelihood Data: z=8,n=26 95% HDI Posterior beta( 10,38) p(d)=8.11e 09 And in either case, we can say how probable the coin is to be fair (or any other value..)

36 NHST vs. Bayes Another thorny issue that comes up A LOT: MulIple Comparisons! How many NHSTs can you run on a set of data without changing your p-value?

37 NHST vs. Bayes MulIple Comparisons! Per Comparison false alarm rate: ExperimentWise false alarm rate: (where c is number of comparisons) > so to get <.05, we have to lower SOMETIMES A LOT

38 NHST vs. Bayes MulIple Comparisons! There are a lot of correcions for muliple comparisons, most notably Bonferonni correcion: Some actually depend on whether the comparison was planned in advance or not (?!?!) But in ALL cases, one has to trade off power for exhausive checking of the space...

39 NHST vs. Bayes MulIple Comparisons! There are a lot of correcions for muliple comparisons, most notably Bonferonni correcion: Well, unless you use Bayesian Methods Some actually depend on whether the comparison was planned in advance or not (?!?!) But in ALL cases, one has to trade off power for exhausive checking of the space...

40 NHST vs. Bayes MulIple Comparisons! In a Bayesian analysis, you calculate the enire posterior from the data you have and the prior, then you can examine the posterior however you d like. NO correcions for MulIple Comparisons Necessary! Bayesian methods correct for false posiives by incorporaing a reasonable Prior, and examining the full parameter space.

41 NHST vs. Bayes MulIple Comparisons! In a Bayesian analysis, you calculate the enire posterior from the data you have and the prior, then you can examine the posterior however you d like. NO correcions for MulIple Comparisons Necessary! Bayesian methods correct for false posiives by incorporaing a reasonable Prior, and examining the full parameter space.

42 NHST vs. Bayes So, why have NHST methods been so popular for so long? Maybe because instead of these:

43 NHST vs. Bayes So, why have NHST methods been so popular for so long? We had these:

44 NHST vs. Bayes So, why have NHST methods been so popular for so long? For all their hidden assumpions, reliance on intenions, problems with muliple comparisons, and other issues, Most of the math necessary for a T-test had been done. Would you like to try Gibbs sampling by hand?

45 NHST vs. Bayes Fortunately, The speed of modern computers has made the world a safe and producive place for Bayesians everywhere...

46 PracIcal Bayes Ok, so all this is great, but what do we actually DO with Bayes? For many experiments, frequenist stats are appropriate and all that is necessary And full Bayesian models can be tricky to set up and fully implement. If you ARE interested in learning how to employ those, READ THE BOOK! (and/or talk to me about it)

47 PracIcal Bayes But more importantly, Bayesian principles can help us understand what kind of quesions we ARE asking and what kind of quesions we COULD (and maybe should) ask. Neuroscience example: Decoding!

hrps://en.wikipedia.org/wiki/place_cell (Bayesian) Decoding Place Cells in Hippocampus These cells fire when the animal is in a certain spaial locaion.

48 hrps://en.wikipedia.org/wiki/place_cell (Bayesian) Decoding Place Cells in Hippocampus These cells fire when the animal is in a certain spaial locaion. What is the data we have here? Spatial Tuning Curve cell Spatial Tuning Curve cell Tuning curves for HPC cells P(FR(c) Pos)

49 (Bayesian) Decoding Place Cells in Hippocampus Spatial Tuning Curve cell What is more interesing than P(FR(c) Pos)? How about P(Pos FR(c) )? 0 P(Pos FR(c) ) = P(FR(c) Pos) * P(Pos) P(FR(c))

50 (Bayesian) Decoding Place Cells in Hippocampus Decoded vs. Actual Position, HPC PFC Decoded vs. Actual Position, HPC Decoded Position F2 F1 CP 0.35 So, we can decode 0.3 F2 the animal s posiion 0.25 F1 0.2 from HPC what else P(Pos FR(c) ) = P(FR(c) 0.15 Pos) can * we P(Pos) CP look at? 0.1 P(FR(c)) 0.05 Decoded Position CP F1 F2 Actual Position CP F1 F2 Actual Position 0.02

51 (Bayesian) Decoding Place Cells in Hippocampus P(Pos FR(c) ) = P(FR(c) Pos) * P(Pos) P(FR(c)) What about P(Pos)? What should we use for our Prior? As usual, it depends on the quesion you want to ask

52 Anyhoo, I could keep going, but we re probably about out of Ime QuesIons? Or is it just Ime for Pizza?

CS 361: Probability & Statistics

October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite