Bayesian Data Analysis

Size: px

Start display at page:

Download "Bayesian Data Analysis"

Abigayle Smith
5 years ago
Views:

1 An Introduction to Bayesian Data Analysis Lecture 1 D. S. Sivia St. John s College Oxford, England April 29, 2012 Some recommended books Outline Introduction (1) Introduction (2) Introduction (3) Basic rules and corollaries Basic rules and corollaries Basic rules and corollaries Parameter estimation Coin example: uniform prior (1) Coin example: uniform prior (2) Coin example: alternative priors (1) Coin example: alternative priors (2) Summarising the inference The Gaussian, or normal, distribution Gaussian approximation for the coin Asymmetric posterior pdfs Multimodal posterior pdfs Limit-setting inference (1) Limit-setting inference (2) Limit-setting inference (3) Gaussian noise and averages (1) Gaussian noise and averages (2) Data with different-sized error-bars The lighthouse problem (1) Lorentzian, or Cauchy, distribution The lighthouse problem (2) Lighthouse example (1) Lighthouse example (2) Moral of the lighthouse story

2 Some recommended books Data analysis: a Bayesian tutorial D. S. Sivia (1996), Oxford University Press; with J. Skilling (2006) Bayesian logical data analysis for the physical sciences P. Gregory (2005), Cambridge University Press Information theory, inference and learning algorithms D.J.C. MacKay (2004), Cambridge University Press Probability theory: the logics of science E.T. Jaynes (2003), Cambridge University Press Bayesian reasoning in data analysis G. D Agostini (2003), World Scientific Publishing IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 2 / 31 Outline The basics Parameter estimation I Parameter estimation II Model selection Assigning probabilities Non-parametric estimation Experimental design Least-squares extensions Nested sampling Quantification IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 3 / 31 2

3 Introduction (1) Deductive logic (pure maths) Inductive logic (science) or plausible reasoning IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 4 / 31 Introduction (2) Bernoulli (1713), Bayes (1763) and Laplace (1812) developed probability theory to reason in situations where we cannot argue with certainty.... bet of 11,000 against 1 that the error of this result is not of its value. [Laplace] IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 5 / 31 3

4 Introduction (3) Probability theory is nothing but common sense reduced to calculation. [Laplace] Cox (1946) showed that any method of plausible reasoning that satisfies simple rules of logical consistency must be equivalent to the use of ordinary probability theory. Bayesian interpretation of probability A probability encodes a state of knowledge. All probabilities are conditional. IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 6 / 31 Basic rules and corollaries Range : 0 prob(x I) 1 Sum rule : prob(x I) + prob ( X I ) = 1 Product rule : prob(x,y I) = prob(x Y,I) prob(y I) Bayes theorem : prob(x Y,I) = prob(y X,I) prob(x I) prob(y I) Marginalisation : prob(x I) = prob(x,y I) + prob ( X,Y I ) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 7 / 31 4

5 Basic rules and corollaries Range : 0 prob(x I) 1 Sum rule : prob(x I) + prob ( X I ) = 1 Product rule : prob(x,y I) = prob(x Y,I) prob(y I) Bayes theorem : prob(x Y,I) = prob(y X,I) prob(x I) prob(y I) Marginalisation : prob(x I) = M prob(x,y j I) j=1 IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 8 / 31 Basic rules and corollaries Range : 0 prob(x I) 1 Sum rule : prob(x I) + prob ( X I ) = 1 Product rule : prob(x,y I) = prob(x Y,I) prob(y I) Bayes theorem : prob(x Y,I) = prob(y X,I) prob(x I) prob(y I) + Marginalisation : prob(x I) = prob(x,y I) dy IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 9 / 31 5

6 1-Parameter estimation Data D = r heads in n flips. Is this a fair coin? If H is the probability of getting a head, what is the value of H? prob(h D,I) }{{} Posterior prob(d H,I) }{{} Likelihood prob(h I) }{{} Prior (Bayes ) Binomial likelihood: prob(d H,I) = n! r!(n r)! H r (1 H) n r Ignorant prior: prob(h I) = { 1 for 0 H 1 0 otherwise IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 10 / 31 Coin example: uniform prior (1) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 11 / 31 6

7 Coin example: uniform prior (2) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 12 / 31 Coin example: alternative priors (1) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 13 / 31 7

8 Coin example: alternative priors (2) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 14 / 31 Summarising the inference X = Xo ± σ Let L = log e [ prob(x {data},i) ] Taylor: L(X) = L(Xo) d 2 L dx 2 X o (X Xo) 2 +, where dl dx 0 X o= [ ] ( prob(x {data},i) 1 σ 2π exp (X X o) 2 2σ 2, σ = d2 L dx 2 X o ) 1/2 IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 15 / 31 8

9 The Gaussian, or normal, distribution [ ] prob(x µ,σ) = 1 σ 2π exp (x µ)2 2σ 2 FWHM 2.35σ x = x prob(x µ,σ) dx = µ (mean) (x µ) 2 = (x µ) 2 prob(x µ,σ)dx = σ 2 (variance) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 16 / 31 Gaussian approximation for the coin prob(h D,I) H r (1 H) n r L = const + r ln(h) + (n r)ln(1 H) dl dh = r H o Ho (n r) (1 Ho) = 0 H o = r n d 2 L dh 2 = r Ho 2 (n r) (1 Ho) 2 = n H o(1 Ho) σ = Ho (1 Ho) n H o For r=9 and n=32, H 0.28 ± 0.08 IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 17 / 31 9

10 Asymmetric posterior pdfs IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 18 / 31 Multimodal posterior pdfs IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 19 / 31 10

11 Limit-setting inference (1) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 20 / 31 Limit-setting inference (2) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 21 / 31 11

12 Limit-setting inference (3) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 22 / 31 Gaussian noise and averages (1) Given N independent measurements of a quantity µ, {d k }, all subject to Gaussian noise σ, what can we say about the value of µ? [ ] Gaussian datum: prob(d k µ,σ,i) = 1 σ 2π exp (d k µ) 2 2σ 2 Independence: prob({d k } µ,σ,i) = N prob(d k µ,σ,i) prob(x,y I) = prob(x Y,I) prob(y I) = prob(x I) prob(y I) } {{ } prob(x I) Prior: prob(µ σ,i) = constant (µ min µ µ max ) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 23 / 31 12

13 Gaussian noise and averages (2) Bayes: prob(µ {d k },σ,i) prob({d k } µ,σ,i) prob(µ σ,i) L = log e [ prob(µ {dk },σ,i) ] = const 1 2σ 2 N (d k µ) 2 dl dµ = µo N d k µ o σ 2 = 0 µ o = 1 N N d k d 2 L dµ 2 = µo N 1 σ 2 = N σ 2 µ = µ o ± σ N IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 24 / 31 Data with different-sized error-bars If the various measurements were of differing quality, so that each datum d k was associated with its own noise-level σ k, then the preceding analysis needs to be modified as follows. L = log e [ prob(µ {dk,σ k },I) ] = const 1 2 dl dµ = 0 µ o = µo / N N w k d k d 2 ( L N ) 1/2 dµ 2 µ = µ o ± w k µo N (d k µ) 2 σ k 2 w k, where w k = 1 σ k 2 IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 25 / 31 13

14 The lighthouse problem (1) Data: prob(θ k α,β,i) = 1 π But β tan θ k = x k α prob(x k α,β,i) = β π [β 2 + (x k α) 2] IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 26 / 31 Lorentzian, or Cauchy, distribution prob(x α,β) = β π [β 2 + (x α) 2] FWHM = 2β x = x prob(x α,β) dx = α ( ) (x α) 2 = (x α) 2 prob(x α,β) dx (!) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 27 / 31 14

15 The lighthouse problem (2) N Independence: prob({x k } α,β,i) = prob(x k α,β,i) Prior: prob(α β,i) = constant (α min α α max ) Bayes: prob(α {x k },β,i) prob({x k } α,β,i) prob(α β,i) L = log e [ prob(α {xk },σ,i) ] = const dl dα = 2 αo N x k α o β 2 + (x k α o ) 2 = 0 N log e [ β 2 + (x k α) 2] Can t be solved analytically! IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 28 / 31 Lighthouse example (1) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 29 / 31 15

16 Lighthouse example (2) IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 30 / 31 Moral of the lighthouse story The sample mean is not always a useful number. ( Infinite variance of Cauchy central limit theorem does not hold ) Let probability theory decide what s best! IN2P3/CNRS School of Statistics (Autrans 2012): Bayesian Lecture 1 31 / 31 16

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,