A tutorial on. Richard Boys

Size: px

Start display at page:

Download "A tutorial on. Richard Boys"

Tobias Weaver
6 years ago
Views:

1 A tutorial on Bayesian inference for the normal linear model Richard Boys Statistics Research Group, Newcastle University, UK

2 Motivation Much of Bayesian inference nowadays analyses complex hierarchical models using computer-intensive methods: MCMC, pmcmc, ABC, HMC,... Not that long ago, most analyses used a conjugate analysis of say the normal linear model a non-conjugate analysis fitted via techniques such a Gaussian quadrature or Laplace s method to evaluate the required integrals This tutorial will give an overview of the basics underpinning an analysis of data assuming a normal linear model and a conjugate prior

3 Normal random sample Example The 18th century physicist Henry Cavendish made 23 experimental determinations of the earth s density, and these data (in g/cm 3 ) are with sufficient statistics n = 23, x = , s = Normal Q Q Plot Sample Quantiles Theoretical Quantiles

4 Conjugate analysis Data: X i µ, τ N(µ, 1/τ), i = 1, 2,..., n (indep) Likelihood function: π(x µ, τ) = Conjugate prior: Take Write ( τ ) n/2 [ exp nτ { s 2 + ( x µ) 2}] 2π 2 π(µ, τ) τ exp [ τ { + ( µ) 2}] µ τ N = π(µ τ)π(τ) ( b, 1 ), τ Ga(g, h) cτ ( ) µ NGa(b, c, g, h). Prior density is τ π(µ, τ) τ g 1 2 exp { τ 2 [ c(µ b) 2 + 2h ]}, µ R, τ > 0

5 Question What s the posterior distribution for Hint: c(µ b) 2 + n( x µ) 2 = (c + n) ( ) µ? τ { µ ( )} cb + n x 2 nc( x b)2 + c + n c + n

6 Hint: { ( )} cb + n x 2 c(µ b) 2 + n( x µ) 2 = (c + n) µ + c + n nc( x b)2 c + n Using Bayes Theorem, the posterior density is, for µ R, τ > 0 π(µ, τ x) π(µ, τ) π(x µ, τ) { τ g+ n exp τ [ c(µ b) 2 + n( x µ) 2 + 2h + ns 2]} { 2 [ τ g+ n exp τ { ( )} cb + n x 2 nc( x b)2 (c + n) µ + 2 c + n c + n { τ G 1 2 exp τ [ C(µ B) 2 + 2H ]} 2 + 2h + ns 2 where B = bc+n x c+n, C = c + n, G = g + n 2, H = h + cn( x b)2 2(c+n) + ns2 2 Therefore ( ) µ x NGa(B, C, G, H) τ

7 Posterior analysis ( ) µ Posterior x NGa(B, C, G, H) τ Clearly τ Ga(G, H) Question What s the marginal posterior for µ? Hints: 1 Posterior density: π(µ, τ x) τ G 1 2 exp { τ θ a 1 e bθ dθ = Γ(a)/ b a 3 If Y t a (b, c) then it has density f(y a, b, c) {1 + [ C(µ B) 2 + 2H ]} } a+1 (y b)2 2, y R ac

8 Hints: 0 θ a 1 e bθ dθ = Γ(a) b a and f(y a, b, c) {1 + } a+1 (y b)2 2 ac The (marginal) posterior density for µ is, for µ R π(µ x) = 0 π(µ, τ x) dτ { τ G exp τ [ C(µ B) 2 + 2H ]} dτ 0 2 Γ ( ) G using [{C(µ B) 2 + 2H}/2}] G } 2G+1 2 C(µ B)2 {1 + 2H θ a 1 e bθ dθ = Γ(a) b a Therefore µ x t 2G (B, ) H GC

9 Some distribution theory Generalised t distribution: Y t a (b, c) Density is f(y a, b, c) = Γ ( ) a+1 2 ( acπ Γ a {1 + 2) Parameters: a > 0, b R, c > 0 } a+1 (y b)2 2, y R ac Generalisation of the standard t-distribution since (Y b)/ c t a E(Y ) = Mode(Y ) = b and V ar(y ) = ac a 2, if a 2 t a (0, 1) t a lim a t a (b, c) = N(b, c)

10 Inverse Chi distribution: Y Inv-Chi(a, b) Density is f(y a, b) = 2ba y 2a 1 e b/y2, y > 0 Γ(a) Parameters: a > 0, b > 0 bγ(a 1/2) E(Y ) = Γ(a) V ar(y ) = b a 1 E(Y )2, if a 1 The name of the distribution comes from the fact that 1/Y 2 Ga(a, b) χ 2 2a /(2b)

11 Summary of the posterior distribution Posterior ( ) µ x NGa(B, C, G, H) τ Marginal distributions: ( ) µ x t 2G B, H GC τ x Ga(G, H) σ x Inv-Chi(G, H)

12 An Example The 18th century physicist Henry Cavendish made 23 experimental determinations of the earth s density, and these data (in g/cm 3 ) are with sufficient statistics n = 23, x = , s = Data model X i µ, τ indep N(µ, 1/τ), i = 1, 2,..., 23 ( ) Prior: µ NGa(b = 5.41, c = 0.25, g = 2.5, h = 0.1) τ Posterior: ( ) µ x NGa(B = , C = 23.25, G = 14, H = ) τ µ x t 28 (5.4840, ), τ x Ga(14, ), σ x Inv-Chi(14, )

13 Comparison of priors and posteriors (Wikipedia: µ = g/cm 3 ) density µ density density τ σ

14 Comparison of prior and posteriors Contours of posterior density: τ µ

15 Confidence intervals and regions Point estimates Could use posterior mean, mode,... Not really worth having without some idea of uncertainty Interval estimates Univariate parameters Confidence intervals, credible intervals, Bayesian confidence intervals Highest density intervals (HDI), equi-tailed intervals Symmetric posteriors: HDI = equi-tailed interval Here µ x t 2G {B, H/(GC)} is symmetric HDI easy τ x Ga(G, H) is skewed HDI non-trivial but equi-tailed interval is easy Ditto for σ x Inv-Chi(G, H)

16 Results from this data analysis... 95% confidence intervals Prior Posterior µ: (4.38, 6.44) (5.40, 5.56) τ: (1.48, 55.96) (14.02, 42.25) HDI (4.16, 64.16) (15.07, 43.76) σ: (0.11, 0.42) (0.15, 0.26) HDI (0.12, 0.49) (0.15, 0.26)

17 Confidence regions So far have looked at univariate HDIs Can be useful to also look at (joint) confidence regions Question What is the 100(1 α)% HDI region for (µ, τ) T? Hint: log π(µ, τ x) = τ 2 { ( C(µ B) 2 G 1 ) } log τ + 2H + const 2

18 Hint: log π(µ, τ x) = τ 2 { C(µ B) 2 ( G 1 2 ) log τ + 2H } + const The 100(1 α)% HDI region for (µ, τ) T is {( ) } µ : π(µ, τ x) > k τ α {( ) } µ = : log π(µ, τ x) < k α τ {( ) { ( µ τ = : C(µ B) 2 G 1 ) } } log τ + 2H < k α τ 2 2 How to determine k α?

19 Need the posterior distribution function F ( ) of Y (µ, τ) = τ { ( C(µ B) 2 G 1 ) } log τ + 2H 2 2 and take k α s.t. F (k α) = 1 α Not a standard distribution build up F via simulation

20 Cavendish example τ µ Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for (µ, τ) T ; 95% (outer), 80% (inner).

21 Focusing on central part of plot... τ Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for (µ, τ) T ; 95% (outer), 80% (inner). µ

22 Predictive distribution The predictive density of a new observation y is f(y x) = f(y µ, τ) π(µ, τ x) dµ dτ As this is a conjugate analysis, we can determine the predictive density using Candidate s formula π(θ x, y) = π(θ)f(x, y θ) f(x, y) = π(θ)f(x θ)f(y θ) f(x)f(y x) π(θ x) f(y θ) = f(y x) f(y x) = f(y θ)π(θ x) π(θ x, y) since X and Y are indep given θ But, for this model, there is a more straightforward way...

23 Y µ, τ N ( µ, 1 τ ), (µ, τ) T x NGa(B, C, G, H) Y = µ + ε, ε τ N ( 0, 1 τ ), µ x, τ N ( B, 1 Cτ ), τ x Ga(G, H) Y x, τ N Already seen µ τ, x N ( B, 1 Cτ Y x t 2G ( ( B, 1 τ + 1 ) ( N B, C + 1 ), τ x Ga(G, H) Cτ Cτ B, H(C+1) GC ) ( ), τ x Ga(G, H) µ x t2g B, H GC )

24 Predictive distribution of summary statistics Future random sample y 1, y 2,..., y m Sufficient statistics: sample mean Ȳm and (biased) sample variance V m = m i=1 (Y i Ȳ )2 /m Question 1 What is the predictive distribution of Ȳm? Hint: Y µ, τ N ( µ, 1 τ ), (µ, τ) T x NGa(B, C, G, H) Y = µ + ε, ε τ N ( ) ( ) 0, 1 τ, µ x, τ N B, 1 Cτ, τ x Ga(G, H) ( Y x, τ N B, 1 τ + 1 ) ( N B, C + 1 ), τ x Ga(G, H) Cτ Cτ ( ) Y x t 2G B, H(C+1) GC 2 What is the predictive distribution of V m?

25 Predictive distribution of Ȳm Ȳ m µ, τ N ( µ, 1 mτ ), (µ, τ) T x NGa(B, C, G, H) Ȳ m = µ+ε, ε τ N ( ) ( ) 1 0, mτ, µ x, τ N B, 1 Cτ, τ x Ga(G, H) Ȳ m x, τ N Ȳ m x t 2G ( ( B, 1 mτ + 1 ) ( N B, C + m ), τ x Ga(G, H) Cτ Cmτ B, H(C+m) GCm ) Note that Ȳm x D µ x as m

26 Predictive distribution of V m = m i=1 (Y i Y ) 2 /m In normal random samples (m 1)S 2 u σ 2 χ 2 m 1 mv m τ τ χ 2 m 1 V m τ Ga Predictive density for V m is f(v x) = f(v τ) π(τ x) dτ = = ( m 2 ) m 1 2 H G v m Γ( m 1 1 B( m 1 2, G) V m x 2 )Γ(G) ( m 2H ) m 1 2 (m 1)H mg 0 v m 1 ( m 1 τ m 1 2 +G 1 e (mv/2+h)τ dτ ( mv 2H F m 1,2G 2 ) ( m 1 2 +G), mτ ) 2

27 What happens as m? Actually look at the predictive distribution of 1/V m as m As F ν1,ν 2 χ2 ν 1 /ν 1 χ 2 Ga(ν 1/2, ν 1 /2) ν 2 /ν 2 Ga(ν 2 /2, ν 2 /2) (m 1)H V m x F m 1,2G mg = 1 x V m mg (m 1)H (m 1)H mg Ga{(m 1)/2, (m 1)/2} Ga(G, G) Ga(G, G) Ga{(m 1)/2, (m 1)/2} Now Ga{(m 1)/2, (m 1)/2} has mean 1 and variance 2/(m 1) So as m 1 V m x D G H Ga(G, G) 1 Ga(G, H) τ x

28 Summary Data: normal random sample X i µ, τ indep Prior: (µ, τ) T NGa(b, c, g, h) is conjugate N(µ, 1/τ), i = 1, 2,..., n Posterior: (µ, τ) T x NGa(B, C, G, H) Marginal posteriors: µ x t 2G {B, H/(GC)}, σ x Inv-Chi(G, H) τ x Ga(G, H), Marginal HDIs or equi-tailed CIs fairly easy to calculate Joint HDI regions a little trickier (but solved by simulation) Predictive: Y x t 2G {B, H(C + 1)/(GC)}

29 Inference in a normal linear model Introduction Data (y i, x i1,..., x ip ), i = 1, 2,..., n Multiple linear regression model p Y i = β j x ij + ε i, j=1 ε i τ indep N(0, 1/τ) In matrix notation: Y = Xβ + ε Y 1 x 11 x 12 x 1p β 1 ε 1 Y 2 x Y =., X = 21 x 22 x 2p β..., β = 2 ε., ε = 2.. Y n x n1 x n1 x np β p ε n ε i indep N(0, 1/τ) = ε N n (0, τ 1 I n ) Therefore Y X, β, τ N n (Xβ, τ 1 I n )

30 Conjugate analysis Data: Y X, β, τ N n (Xβ, τ 1 I n ) Likelihood function: f(y X, β, τ) = (2π) n/2 τ n/2 exp { τ [ ]} ns 2 + (β ˆβ) T X T X(β ˆβ) 2 ˆβ = (X T X) 1 X T y is the least squares (or max lik) estimate of β s 2 = (y X ˆβ) T (y X ˆβ)/n is the r.m.s. Conjugate prior: π(β, τ) τ exp [ τ { + (β ) T (β ) }] = π(β τ)π(τ) { Take β τ N p b, (cτ) 1 }, τ Ga(g, h) ( ) β Write N τ p Ga(b, c, g, h). Prior density is, for β R p, τ > 0 { π(β, τ) τ g+ p 2 1 exp τ [ (β b) T c (β b) + 2h ]} 2

31 Question What s the posterior distribution for Hint: ( ) β? τ (β b) T c (β b) + (β ˆβ) T X T X(β ˆβ) = (β B) T (c + X T X)(β B) B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ where B = (c + X T X) 1 (cb + X T X ˆβ)

32 Hint: (β b) T c (β b) + (β ˆβ) T X T X(β ˆβ) = (β B) T (c + X T X)(β B) B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ Using Bayes Theorem, the posterior density is, for β R p, τ > 0 π(β, τ D) π(β, τ) f(y X, β, τ) { τ g+ p 2 1 exp τ [ (β b) T c (β b) + 2h ]} { 2 τ n 2 exp τ [ ]} ns 2 + (β ˆβ) T X T X(β ˆβ) { 2 τ g+ n 2 + p 2 1 exp τ [ (β B) T (c + X T X)(β B) + 2h + ns 2 2 ]} B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ { τ G+ p 2 1 exp τ [ (β B) T C (β B) + 2H ]} 2

33 { π(β, τ D) τ G+ p 2 1 exp τ [ (β B) T C (β B) + 2H ]} 2 where B = (c + X T X) 1 (cb + X T X ˆβ), C = c + X T X, G = g + n 2, H = h + 1 { ns 2 B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ } 2 Therefore ( ) β D N τ p Ga(B, C, G, H)

34 Posterior analysis ( ) β Posterior D N τ p Ga(B, C, G, H) Clearly τ D Ga(G, H) Question What s the marginal posterior for β? Hints: 1 Posterior density: 2 π(β, τ D) τ G+ p 2 1 exp { τ 2 0 θ a 1 e bθ dθ = Γ(a)/ b a 3 If X t a (b, c) then it has density f(x a, b, c) [ (β B) T C (β B) + 2H ]} { 1 + (x b)t c 1 } a+p 2 (x b), x R p a

35 Hints: 0 θ a 1 e bθ dθ = Γ(a) { b a and f(x a, b, c) 1 + (x b)t c 1 } a+p (x b) 2 a The posterior density for β is, for β R p π(β D) = 0 0 π(β, τ D) dτ { τ G+ p 2 1 exp τ [ (β B) T C (β B) + 2H ]} dτ 2 ) Γ ( G + p 2 [{(β B) T C (β B) + 2H}/2}] G+ p 2 { 1 + (β } (G+ p B)T 2 ) C (β B) 2H using 0 θ a 1 e bθ dθ = Γ(a) b a Therefore β D t 2G ( B, H G C 1 )

36 Some more distribution theory p-dimensional t distribution: X t a (b, c) Density for x R p f(x a, b, c) = Γ ( a+p) 2 c 1/2 (aπ) p/2 Γ ( ) a 2 { 1 + (x b)t c 1 } a+p 2 (x b) a Parameters: a > 0, b = (b i ) R p, c = (c ij ) is a symmetric positive definite matrix Generalisation of the univariate t a (b, c) distribution

37 Properties of X t a (b, c) 1 E(X) = Mode(X) = b V ar(x) = ac/(a 2), a 2 Univariate: E(X i ) = b i V ar(x i ) = ac ii /(a 2), a 2 Corr(X i, X j ) = c ij / c ii c jj 2 A is q p, d is q 1 vector: AX + d t a (Ab + d, AcA T ) 3 Univariate marginals: X i t a (b i, c ii ) 4 The contours of its density are ellipsoids centred at x = b: 5 (X b) T c 1 (X b) p F p,a {x : (x b) T c 1 (x b) = k}

38 Summary of the posterior distribution Posterior ( ) β D N τ p Ga(B, C, G, H) Marginal distributions: β D t 2G ( B, H G C 1) β i D t 2g ( Bi, H G Cii), i = 1,..., p τ D Ga(G, H) σ D Inv-Chi(G, H)

39 Confidence intervals and regions ( Regression parameters: β i D t 2G Bi, H G Cii) 100(1 α)% HDI for β i is just an equi-tailed interval: ( ) HC ii HC B i t 2G;α/2 G, B ii i + t 2G;α/2 G where t 2G;α/2 is the upper α/2 point of the t 2G distribution Noise parameters: τ D Ga(G, H), σ D Inv-Chi(G, H) HDIs for τ or σ non-trivial Equi-tailed intervals easy

40 Questions What is the form of an HDI-type confidence region for the regression parameters? Hint: π(β D) { 1 + (β } (G+ p B)T C (β B) 2), β R p 2H Determine the 100(1 α)% HDI type confidence region Hints: 1 β D t 2G ( B, H G C 1) 2 Property 5 of the p dim t-distribution is if X t a (b, c) then (X b) T c 1 (X b) p F p,a

41 What is the form of an HDI-type confidence region for β? Hint: π(β D) { 1 + (β } (G+ p B)T 2 ) C (β B) 2H HDI: {β : π(β D) > u} = {β : (β B) T C (β B) < k} Determine the 100(1 α)% HDI type confidence region for β As X t a (b, c) (X b) T c 1 (X b) p F p,a (β B) T ( H G C 1 ) 1 (β B) p F p,2g (β B) T C (β B) ph G F p,2g the 100(1 α)% confidence region is { β : (β B) T C (β B) < ph } G F p,2g;1 α

42 Predictive distributions Want to predict the response Y when the covariate is x Model: Y = β T x + ε, where ε τ N(0, 1/τ) From posterior β τ, D N p {B, (Cτ) 1 } β T x τ, D N { B T x, x T (Cτ) 1 x } Y x, τ, D N{B T x, (x T C 1 x + 1)/τ} From posterior: τ D Ga(G, H) Already seen µ τ, x N ( B, 1 Cτ Therefore ) ( ), τ x Ga(G, H) µ x t2g B, H GC Y x, D t 2G {B T x, H(x T C 1 x + 1)/G}

43 Summary Data: Y X, β, τ N n (Xβ, τ 1 I n ) ( ) β Prior: N τ p Ga(b, c, g, h) is conjugate ( ) β Posterior: D N τ p Ga(B, C, G, H) Marginal posteriors: β D t 2G (B, HC 1 /G), β i D t 2G (B i, HC ii /G), τ D Ga(G, H), σ D Inv-Chi(G, H) Univariate HDIs or equi-tailed CIs fairly easy to calculate HDI regions for β are interior of ellipsoids Predictive: Y x, D t 2G {B T x, H(x T C 1 x + 1)/G}

44 Linear regression example Malcolm wants to be able to predict the height (Y ) of a student from their shoe size (X) He thinks that a simple linear regression Y = α + βx + ε explains the relationship between the variables Prior elicitation Malcolm decides that he will set up his prior by using information about his measurements and those of his wife His shoe size is 11 and he is 74 inches tall. Therefore he gives prior E(α + 11β) = 74 inches He realises that he is probably not exactly the average height for size 11 shoe-wearers he feels he is unlikely to be more than six inches from this conditional mean and so gives a prior SD(α + 11β) = 3 inches His wife takes a size 5 shoe and is 64 inches tall and so takes prior E(α + 5β) = 64 inches and SD(α + 5β) = 3 inches He decides that his beliefs about α + 11β and α + 5β are independent Finally, he believes that the measurement error precision τ has mean 0.5 and variance 0.125

45 Question Need to know the parameters of his congugate prior distribution (α, β, τ) T N 2 Ga(b, c, g, h) 1 How might you calculate their values from E(α + 11β) = 74 inches, SD(α + 11β) = 3 inches E(α + 5β) = 64 inches, SD(α + 5β) = 3 inches α + 11β and α + 5β are independent τ has mean 0.5 and variance Roughly what size is Corr(α, β)?

46 τ Ga(g, h) E(τ) = g/h V ar(τ) = g/h 2 Therefore, as Malcolm requires E(τ) = 0.5 and V ar(τ) = g = E(τ)2 E(τ) = 2 and h = V ar(τ) V ar(τ) = 4

47 E(α + 11β) = 74, V ar(α + 11β) = 9, E(α + 5β) = 64, V ar(α + 5β) = 9, Cov(α + 11β, α + 5β) = 0 E(α) + 11 E(β) = 74, E(α) + 5 E(β) = 64, V ar(α) V ar(β) + 22 Cov(α, β) = 9, V ar(α) + 25 V ar(β) + 10 Cov(α, β) = 9, V ar(α) + 55 V ar(β) + 16 Cov(α, β) = 0 ( ) ( ) E(α) 167/3 E(β) = = E(β) 5/3 ( ) V ar(α) Cov(α, β) V ar(β) = = Cov(α, β) V ar(β) Corr(α, β) = ( 36.5 ) = very strong negative

48 β t 2g (b, h g c 1 ) E(β) = b V ar(β) = Therefore ( ) 167/3 b = 5/3 c = 2g 2g 2 h g 1 V ar(β) 1 = 4 Malcolm s conjugate prior α β N 2 Ga τ { b = ( h g c 1 ) = h g 1 c 1 ( ) 167/3, c = 5/3 ( ) = ( ) ( ) } , g = 2, h =

49 Malcolm s data shoe height

50 Data calculations (ˆαˆβ) ( ˆβ = = ( n n i=1 x i n i=1 x n i i=1 x2 i X T X = Malcolm s posterior α { β τ D N 2Ga B = ), ns 2 = , ) = ( ) , C = ( 52 ) ( ) , G = 28, H = }

51 Marginal prior and posterior distributions density density α β density density τ σ

52 95% confidence intervals Confidence regions α: (54.42, 56.98) β: (1.45, 1.75) τ: (0.55, 1.18) HDI (0.57, 1.20) σ: (0.90, 1.31) HDI (0.91, 1.32) Here p = 2 and so the 100(1 α)% confidence region for β = (α, β) T is { β : (β B) T C (β B) < 2H } G F 2,2G;1 α

53 Prior and posterior confidence regions β Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for β = (α, β) T ; 95% (outer), 80% (inner). α

54 Focusing on central part of plot... β Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for β = (α, β) T ; 95% (outer), 80% (inner). α

55 One-way ANOVA example Charlotte is a vet is studying the effect of three new diets (Diets 1, 2 and 3) on the blood coagulation times (in seconds) in a set of animals Model the response Y ij of animal j on diet i as Y ij = µ i + ε ij, indep ε ij N(0, 1/τ) This can be rewritten as a normal linear model Need Charlotte s conjugate prior µ 1 µ 2 µ 3 τ N 3Ga(b, c, g, h)

56 Charlotte s prior knowledge Her previous experience with coagulation times for this sort of animal have a range of around 12 seconds Taking this as a (rough) 95% interval suggests that 4 SD(Y ij ) 12 τ 1/9 She sets the prior mean E(τ) = g/h = 1/9 She thinks it unlikely that σ will be greater than 6 seconds and so requires P r(τ < 1/36) 0.05 She decides on g = 24/9 and h = 24 as this gives P r(τ < 1/36) = 0.053

57 Charlotte s prior knowledge She has no specialist information about any of the diets she decides to treat the diets as exchangeable, so that b 1 r r b = b and c = k r 1 r b r r 1 for some choice of b R, k > 0 and r ( 1/2, 1) She decides that E(µ i ) = b = 60 seconds Corr(µ i, µ j ) = r 1 + r = 2 3 r = 0.4 SD(µ i ) 2.5 seconds k = 4.94 seconds 2

58 Charlotte s conjugate prior µ 1 { µ µ 3 N 3Ga b = 60, c = , τ g = 24/9, h = 24 }

59 The data Diet 1 Diet 2 Diet In fact these data come from Linear Models with R by J.J. Faraway (2004)

60 Rewrite the ANOVA model as a normal linear model Take µ 1 X = and β = µ µ

61 Data calculations ns 2 = 98, X T X = diag(n 1, n 2, n 3 ) = diag(4, 6, 8), ˆµ 1 61 ˆβ = ˆµ 2 = 66 ˆµ 3 61 Charlotte s posterior µ 1 { µ µ 3 D N 3 Ga B = , C = , τ } G = , H =

62 Posterior for treatment means Posterior distribution: µ D t 2G (B, H G C 1 ) Posterior mean E(µ D) = Posterior variance matrix V ar(µ D) = Posterior correlation matrix V ar(µ D) = 0.211

63 Prior and posterior distributions of treatment means density density µ 1 µ 2 density µ 3

64 Prior and posterior distributions for within-treatment group variation density density τ σ

65 95% confidence intervals µ 1 : (59.41, 63.82) µ 2 : (61.86, 65.84) µ 3 : (59.63, 63.28) τ: (0.050, 0.170) HDI (0.055, 0.177) σ: (2.291, 4.130) HDI (2.377, 4.270)

66 Alternative parameterisation of the model Standard model Y ij = µ i + ε ij = β = µ 2 µ 3 Alternative parametrisation Y ij = µ + α i + ε ij = µ β = α 2 α 3 with α 1 0 for identifiability and α 2 = µ 2 µ 1, α 3 = µ 3 µ 1 µ 1

67 Question What is the (joint) posterior distribution of the treatment mean differences α = (α 2, α 3 ) T, where α 2 = µ 2 µ 1, α 3 = µ 3 µ 1? Hints: 1 Posterior distribution: µ D t 2G (B, H G C 1 ) 2 If X t a (b, c), A is a q p matrix and d is a q 1 vector then AX + d t a (Ab + d, AcA T )

68 Hints: 1 Posterior distribution: µ D t 2G (B, H G C 1 ) 2 If X t a (b, c), A is a q p matrix and d is a q 1 vector then AX + d t a (Ab + d, AcA T ) We have ( α2 α 3 ) = i.e. α = Aµ ( ) 1 µ µ 2 µ 3 µ D t 2G ( B, H G C 1) α D t 2G ( AB, H G AC 1 A T )

69 Prior and posterior distributions for mean treatment differences density density α 2 α 3 95% HDIs α 2 : ( 0.373, 4.827) α 3 : ( 2.697, 2.365)

70 α e e Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for α = (α 2, α 3 ) T ; 95% (outer), 80% (inner). α 2

71 Focusing on the central part of the plot... α Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for α = (α 2, α 3 ) T ; 95% (outer), 80% (inner). α 2

72 Thanks for listening

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates Chapter 4 Bayesian inference The posterior distribution π(θ x) summarises all our information about θ to date. However, sometimes it is helpful to reduce this distribution to a few key summary measures.