A tutorial on. Richard Boys

Size: px
Start display at page:

Download "A tutorial on. Richard Boys"

Transcription

1 A tutorial on Bayesian inference for the normal linear model Richard Boys Statistics Research Group, Newcastle University, UK

2 Motivation Much of Bayesian inference nowadays analyses complex hierarchical models using computer-intensive methods: MCMC, pmcmc, ABC, HMC,... Not that long ago, most analyses used a conjugate analysis of say the normal linear model a non-conjugate analysis fitted via techniques such a Gaussian quadrature or Laplace s method to evaluate the required integrals This tutorial will give an overview of the basics underpinning an analysis of data assuming a normal linear model and a conjugate prior

3 Normal random sample Example The 18th century physicist Henry Cavendish made 23 experimental determinations of the earth s density, and these data (in g/cm 3 ) are with sufficient statistics n = 23, x = , s = Normal Q Q Plot Sample Quantiles Theoretical Quantiles

4 Conjugate analysis Data: X i µ, τ N(µ, 1/τ), i = 1, 2,..., n (indep) Likelihood function: π(x µ, τ) = Conjugate prior: Take Write ( τ ) n/2 [ exp nτ { s 2 + ( x µ) 2}] 2π 2 π(µ, τ) τ exp [ τ { + ( µ) 2}] µ τ N = π(µ τ)π(τ) ( b, 1 ), τ Ga(g, h) cτ ( ) µ NGa(b, c, g, h). Prior density is τ π(µ, τ) τ g 1 2 exp { τ 2 [ c(µ b) 2 + 2h ]}, µ R, τ > 0

5 Question What s the posterior distribution for Hint: c(µ b) 2 + n( x µ) 2 = (c + n) ( ) µ? τ { µ ( )} cb + n x 2 nc( x b)2 + c + n c + n

6 Hint: { ( )} cb + n x 2 c(µ b) 2 + n( x µ) 2 = (c + n) µ + c + n nc( x b)2 c + n Using Bayes Theorem, the posterior density is, for µ R, τ > 0 π(µ, τ x) π(µ, τ) π(x µ, τ) { τ g+ n exp τ [ c(µ b) 2 + n( x µ) 2 + 2h + ns 2]} { 2 [ τ g+ n exp τ { ( )} cb + n x 2 nc( x b)2 (c + n) µ + 2 c + n c + n { τ G 1 2 exp τ [ C(µ B) 2 + 2H ]} 2 + 2h + ns 2 where B = bc+n x c+n, C = c + n, G = g + n 2, H = h + cn( x b)2 2(c+n) + ns2 2 Therefore ( ) µ x NGa(B, C, G, H) τ

7 Posterior analysis ( ) µ Posterior x NGa(B, C, G, H) τ Clearly τ Ga(G, H) Question What s the marginal posterior for µ? Hints: 1 Posterior density: π(µ, τ x) τ G 1 2 exp { τ θ a 1 e bθ dθ = Γ(a)/ b a 3 If Y t a (b, c) then it has density f(y a, b, c) {1 + [ C(µ B) 2 + 2H ]} } a+1 (y b)2 2, y R ac

8 Hints: 0 θ a 1 e bθ dθ = Γ(a) b a and f(y a, b, c) {1 + } a+1 (y b)2 2 ac The (marginal) posterior density for µ is, for µ R π(µ x) = 0 π(µ, τ x) dτ { τ G exp τ [ C(µ B) 2 + 2H ]} dτ 0 2 Γ ( ) G using [{C(µ B) 2 + 2H}/2}] G } 2G+1 2 C(µ B)2 {1 + 2H θ a 1 e bθ dθ = Γ(a) b a Therefore µ x t 2G (B, ) H GC

9 Some distribution theory Generalised t distribution: Y t a (b, c) Density is f(y a, b, c) = Γ ( ) a+1 2 ( acπ Γ a {1 + 2) Parameters: a > 0, b R, c > 0 } a+1 (y b)2 2, y R ac Generalisation of the standard t-distribution since (Y b)/ c t a E(Y ) = Mode(Y ) = b and V ar(y ) = ac a 2, if a 2 t a (0, 1) t a lim a t a (b, c) = N(b, c)

10 Inverse Chi distribution: Y Inv-Chi(a, b) Density is f(y a, b) = 2ba y 2a 1 e b/y2, y > 0 Γ(a) Parameters: a > 0, b > 0 bγ(a 1/2) E(Y ) = Γ(a) V ar(y ) = b a 1 E(Y )2, if a 1 The name of the distribution comes from the fact that 1/Y 2 Ga(a, b) χ 2 2a /(2b)

11 Summary of the posterior distribution Posterior ( ) µ x NGa(B, C, G, H) τ Marginal distributions: ( ) µ x t 2G B, H GC τ x Ga(G, H) σ x Inv-Chi(G, H)

12 An Example The 18th century physicist Henry Cavendish made 23 experimental determinations of the earth s density, and these data (in g/cm 3 ) are with sufficient statistics n = 23, x = , s = Data model X i µ, τ indep N(µ, 1/τ), i = 1, 2,..., 23 ( ) Prior: µ NGa(b = 5.41, c = 0.25, g = 2.5, h = 0.1) τ Posterior: ( ) µ x NGa(B = , C = 23.25, G = 14, H = ) τ µ x t 28 (5.4840, ), τ x Ga(14, ), σ x Inv-Chi(14, )

13 Comparison of priors and posteriors (Wikipedia: µ = g/cm 3 ) density µ density density τ σ

14 Comparison of prior and posteriors Contours of posterior density: τ µ

15 Confidence intervals and regions Point estimates Could use posterior mean, mode,... Not really worth having without some idea of uncertainty Interval estimates Univariate parameters Confidence intervals, credible intervals, Bayesian confidence intervals Highest density intervals (HDI), equi-tailed intervals Symmetric posteriors: HDI = equi-tailed interval Here µ x t 2G {B, H/(GC)} is symmetric HDI easy τ x Ga(G, H) is skewed HDI non-trivial but equi-tailed interval is easy Ditto for σ x Inv-Chi(G, H)

16 Results from this data analysis... 95% confidence intervals Prior Posterior µ: (4.38, 6.44) (5.40, 5.56) τ: (1.48, 55.96) (14.02, 42.25) HDI (4.16, 64.16) (15.07, 43.76) σ: (0.11, 0.42) (0.15, 0.26) HDI (0.12, 0.49) (0.15, 0.26)

17 Confidence regions So far have looked at univariate HDIs Can be useful to also look at (joint) confidence regions Question What is the 100(1 α)% HDI region for (µ, τ) T? Hint: log π(µ, τ x) = τ 2 { ( C(µ B) 2 G 1 ) } log τ + 2H + const 2

18 Hint: log π(µ, τ x) = τ 2 { C(µ B) 2 ( G 1 2 ) log τ + 2H } + const The 100(1 α)% HDI region for (µ, τ) T is {( ) } µ : π(µ, τ x) > k τ α {( ) } µ = : log π(µ, τ x) < k α τ {( ) { ( µ τ = : C(µ B) 2 G 1 ) } } log τ + 2H < k α τ 2 2 How to determine k α?

19 Need the posterior distribution function F ( ) of Y (µ, τ) = τ { ( C(µ B) 2 G 1 ) } log τ + 2H 2 2 and take k α s.t. F (k α) = 1 α Not a standard distribution build up F via simulation

20 Cavendish example τ µ Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for (µ, τ) T ; 95% (outer), 80% (inner).

21 Focusing on central part of plot... τ Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for (µ, τ) T ; 95% (outer), 80% (inner). µ

22 Predictive distribution The predictive density of a new observation y is f(y x) = f(y µ, τ) π(µ, τ x) dµ dτ As this is a conjugate analysis, we can determine the predictive density using Candidate s formula π(θ x, y) = π(θ)f(x, y θ) f(x, y) = π(θ)f(x θ)f(y θ) f(x)f(y x) π(θ x) f(y θ) = f(y x) f(y x) = f(y θ)π(θ x) π(θ x, y) since X and Y are indep given θ But, for this model, there is a more straightforward way...

23 Y µ, τ N ( µ, 1 τ ), (µ, τ) T x NGa(B, C, G, H) Y = µ + ε, ε τ N ( 0, 1 τ ), µ x, τ N ( B, 1 Cτ ), τ x Ga(G, H) Y x, τ N Already seen µ τ, x N ( B, 1 Cτ Y x t 2G ( ( B, 1 τ + 1 ) ( N B, C + 1 ), τ x Ga(G, H) Cτ Cτ B, H(C+1) GC ) ( ), τ x Ga(G, H) µ x t2g B, H GC )

24 Predictive distribution of summary statistics Future random sample y 1, y 2,..., y m Sufficient statistics: sample mean Ȳm and (biased) sample variance V m = m i=1 (Y i Ȳ )2 /m Question 1 What is the predictive distribution of Ȳm? Hint: Y µ, τ N ( µ, 1 τ ), (µ, τ) T x NGa(B, C, G, H) Y = µ + ε, ε τ N ( ) ( ) 0, 1 τ, µ x, τ N B, 1 Cτ, τ x Ga(G, H) ( Y x, τ N B, 1 τ + 1 ) ( N B, C + 1 ), τ x Ga(G, H) Cτ Cτ ( ) Y x t 2G B, H(C+1) GC 2 What is the predictive distribution of V m?

25 Predictive distribution of Ȳm Ȳ m µ, τ N ( µ, 1 mτ ), (µ, τ) T x NGa(B, C, G, H) Ȳ m = µ+ε, ε τ N ( ) ( ) 1 0, mτ, µ x, τ N B, 1 Cτ, τ x Ga(G, H) Ȳ m x, τ N Ȳ m x t 2G ( ( B, 1 mτ + 1 ) ( N B, C + m ), τ x Ga(G, H) Cτ Cmτ B, H(C+m) GCm ) Note that Ȳm x D µ x as m

26 Predictive distribution of V m = m i=1 (Y i Y ) 2 /m In normal random samples (m 1)S 2 u σ 2 χ 2 m 1 mv m τ τ χ 2 m 1 V m τ Ga Predictive density for V m is f(v x) = f(v τ) π(τ x) dτ = = ( m 2 ) m 1 2 H G v m Γ( m 1 1 B( m 1 2, G) V m x 2 )Γ(G) ( m 2H ) m 1 2 (m 1)H mg 0 v m 1 ( m 1 τ m 1 2 +G 1 e (mv/2+h)τ dτ ( mv 2H F m 1,2G 2 ) ( m 1 2 +G), mτ ) 2

27 What happens as m? Actually look at the predictive distribution of 1/V m as m As F ν1,ν 2 χ2 ν 1 /ν 1 χ 2 Ga(ν 1/2, ν 1 /2) ν 2 /ν 2 Ga(ν 2 /2, ν 2 /2) (m 1)H V m x F m 1,2G mg = 1 x V m mg (m 1)H (m 1)H mg Ga{(m 1)/2, (m 1)/2} Ga(G, G) Ga(G, G) Ga{(m 1)/2, (m 1)/2} Now Ga{(m 1)/2, (m 1)/2} has mean 1 and variance 2/(m 1) So as m 1 V m x D G H Ga(G, G) 1 Ga(G, H) τ x

28 Summary Data: normal random sample X i µ, τ indep Prior: (µ, τ) T NGa(b, c, g, h) is conjugate N(µ, 1/τ), i = 1, 2,..., n Posterior: (µ, τ) T x NGa(B, C, G, H) Marginal posteriors: µ x t 2G {B, H/(GC)}, σ x Inv-Chi(G, H) τ x Ga(G, H), Marginal HDIs or equi-tailed CIs fairly easy to calculate Joint HDI regions a little trickier (but solved by simulation) Predictive: Y x t 2G {B, H(C + 1)/(GC)}

29 Inference in a normal linear model Introduction Data (y i, x i1,..., x ip ), i = 1, 2,..., n Multiple linear regression model p Y i = β j x ij + ε i, j=1 ε i τ indep N(0, 1/τ) In matrix notation: Y = Xβ + ε Y 1 x 11 x 12 x 1p β 1 ε 1 Y 2 x Y =., X = 21 x 22 x 2p β..., β = 2 ε., ε = 2.. Y n x n1 x n1 x np β p ε n ε i indep N(0, 1/τ) = ε N n (0, τ 1 I n ) Therefore Y X, β, τ N n (Xβ, τ 1 I n )

30 Conjugate analysis Data: Y X, β, τ N n (Xβ, τ 1 I n ) Likelihood function: f(y X, β, τ) = (2π) n/2 τ n/2 exp { τ [ ]} ns 2 + (β ˆβ) T X T X(β ˆβ) 2 ˆβ = (X T X) 1 X T y is the least squares (or max lik) estimate of β s 2 = (y X ˆβ) T (y X ˆβ)/n is the r.m.s. Conjugate prior: π(β, τ) τ exp [ τ { + (β ) T (β ) }] = π(β τ)π(τ) { Take β τ N p b, (cτ) 1 }, τ Ga(g, h) ( ) β Write N τ p Ga(b, c, g, h). Prior density is, for β R p, τ > 0 { π(β, τ) τ g+ p 2 1 exp τ [ (β b) T c (β b) + 2h ]} 2

31 Question What s the posterior distribution for Hint: ( ) β? τ (β b) T c (β b) + (β ˆβ) T X T X(β ˆβ) = (β B) T (c + X T X)(β B) B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ where B = (c + X T X) 1 (cb + X T X ˆβ)

32 Hint: (β b) T c (β b) + (β ˆβ) T X T X(β ˆβ) = (β B) T (c + X T X)(β B) B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ Using Bayes Theorem, the posterior density is, for β R p, τ > 0 π(β, τ D) π(β, τ) f(y X, β, τ) { τ g+ p 2 1 exp τ [ (β b) T c (β b) + 2h ]} { 2 τ n 2 exp τ [ ]} ns 2 + (β ˆβ) T X T X(β ˆβ) { 2 τ g+ n 2 + p 2 1 exp τ [ (β B) T (c + X T X)(β B) + 2h + ns 2 2 ]} B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ { τ G+ p 2 1 exp τ [ (β B) T C (β B) + 2H ]} 2

33 { π(β, τ D) τ G+ p 2 1 exp τ [ (β B) T C (β B) + 2H ]} 2 where B = (c + X T X) 1 (cb + X T X ˆβ), C = c + X T X, G = g + n 2, H = h + 1 { ns 2 B T (c + X T X)B + b T c b + ˆβ T X T X ˆβ } 2 Therefore ( ) β D N τ p Ga(B, C, G, H)

34 Posterior analysis ( ) β Posterior D N τ p Ga(B, C, G, H) Clearly τ D Ga(G, H) Question What s the marginal posterior for β? Hints: 1 Posterior density: 2 π(β, τ D) τ G+ p 2 1 exp { τ 2 0 θ a 1 e bθ dθ = Γ(a)/ b a 3 If X t a (b, c) then it has density f(x a, b, c) [ (β B) T C (β B) + 2H ]} { 1 + (x b)t c 1 } a+p 2 (x b), x R p a

35 Hints: 0 θ a 1 e bθ dθ = Γ(a) { b a and f(x a, b, c) 1 + (x b)t c 1 } a+p (x b) 2 a The posterior density for β is, for β R p π(β D) = 0 0 π(β, τ D) dτ { τ G+ p 2 1 exp τ [ (β B) T C (β B) + 2H ]} dτ 2 ) Γ ( G + p 2 [{(β B) T C (β B) + 2H}/2}] G+ p 2 { 1 + (β } (G+ p B)T 2 ) C (β B) 2H using 0 θ a 1 e bθ dθ = Γ(a) b a Therefore β D t 2G ( B, H G C 1 )

36 Some more distribution theory p-dimensional t distribution: X t a (b, c) Density for x R p f(x a, b, c) = Γ ( a+p) 2 c 1/2 (aπ) p/2 Γ ( ) a 2 { 1 + (x b)t c 1 } a+p 2 (x b) a Parameters: a > 0, b = (b i ) R p, c = (c ij ) is a symmetric positive definite matrix Generalisation of the univariate t a (b, c) distribution

37 Properties of X t a (b, c) 1 E(X) = Mode(X) = b V ar(x) = ac/(a 2), a 2 Univariate: E(X i ) = b i V ar(x i ) = ac ii /(a 2), a 2 Corr(X i, X j ) = c ij / c ii c jj 2 A is q p, d is q 1 vector: AX + d t a (Ab + d, AcA T ) 3 Univariate marginals: X i t a (b i, c ii ) 4 The contours of its density are ellipsoids centred at x = b: 5 (X b) T c 1 (X b) p F p,a {x : (x b) T c 1 (x b) = k}

38 Summary of the posterior distribution Posterior ( ) β D N τ p Ga(B, C, G, H) Marginal distributions: β D t 2G ( B, H G C 1) β i D t 2g ( Bi, H G Cii), i = 1,..., p τ D Ga(G, H) σ D Inv-Chi(G, H)

39 Confidence intervals and regions ( Regression parameters: β i D t 2G Bi, H G Cii) 100(1 α)% HDI for β i is just an equi-tailed interval: ( ) HC ii HC B i t 2G;α/2 G, B ii i + t 2G;α/2 G where t 2G;α/2 is the upper α/2 point of the t 2G distribution Noise parameters: τ D Ga(G, H), σ D Inv-Chi(G, H) HDIs for τ or σ non-trivial Equi-tailed intervals easy

40 Questions What is the form of an HDI-type confidence region for the regression parameters? Hint: π(β D) { 1 + (β } (G+ p B)T C (β B) 2), β R p 2H Determine the 100(1 α)% HDI type confidence region Hints: 1 β D t 2G ( B, H G C 1) 2 Property 5 of the p dim t-distribution is if X t a (b, c) then (X b) T c 1 (X b) p F p,a

41 What is the form of an HDI-type confidence region for β? Hint: π(β D) { 1 + (β } (G+ p B)T 2 ) C (β B) 2H HDI: {β : π(β D) > u} = {β : (β B) T C (β B) < k} Determine the 100(1 α)% HDI type confidence region for β As X t a (b, c) (X b) T c 1 (X b) p F p,a (β B) T ( H G C 1 ) 1 (β B) p F p,2g (β B) T C (β B) ph G F p,2g the 100(1 α)% confidence region is { β : (β B) T C (β B) < ph } G F p,2g;1 α

42 Predictive distributions Want to predict the response Y when the covariate is x Model: Y = β T x + ε, where ε τ N(0, 1/τ) From posterior β τ, D N p {B, (Cτ) 1 } β T x τ, D N { B T x, x T (Cτ) 1 x } Y x, τ, D N{B T x, (x T C 1 x + 1)/τ} From posterior: τ D Ga(G, H) Already seen µ τ, x N ( B, 1 Cτ Therefore ) ( ), τ x Ga(G, H) µ x t2g B, H GC Y x, D t 2G {B T x, H(x T C 1 x + 1)/G}

43 Summary Data: Y X, β, τ N n (Xβ, τ 1 I n ) ( ) β Prior: N τ p Ga(b, c, g, h) is conjugate ( ) β Posterior: D N τ p Ga(B, C, G, H) Marginal posteriors: β D t 2G (B, HC 1 /G), β i D t 2G (B i, HC ii /G), τ D Ga(G, H), σ D Inv-Chi(G, H) Univariate HDIs or equi-tailed CIs fairly easy to calculate HDI regions for β are interior of ellipsoids Predictive: Y x, D t 2G {B T x, H(x T C 1 x + 1)/G}

44 Linear regression example Malcolm wants to be able to predict the height (Y ) of a student from their shoe size (X) He thinks that a simple linear regression Y = α + βx + ε explains the relationship between the variables Prior elicitation Malcolm decides that he will set up his prior by using information about his measurements and those of his wife His shoe size is 11 and he is 74 inches tall. Therefore he gives prior E(α + 11β) = 74 inches He realises that he is probably not exactly the average height for size 11 shoe-wearers he feels he is unlikely to be more than six inches from this conditional mean and so gives a prior SD(α + 11β) = 3 inches His wife takes a size 5 shoe and is 64 inches tall and so takes prior E(α + 5β) = 64 inches and SD(α + 5β) = 3 inches He decides that his beliefs about α + 11β and α + 5β are independent Finally, he believes that the measurement error precision τ has mean 0.5 and variance 0.125

45 Question Need to know the parameters of his congugate prior distribution (α, β, τ) T N 2 Ga(b, c, g, h) 1 How might you calculate their values from E(α + 11β) = 74 inches, SD(α + 11β) = 3 inches E(α + 5β) = 64 inches, SD(α + 5β) = 3 inches α + 11β and α + 5β are independent τ has mean 0.5 and variance Roughly what size is Corr(α, β)?

46 τ Ga(g, h) E(τ) = g/h V ar(τ) = g/h 2 Therefore, as Malcolm requires E(τ) = 0.5 and V ar(τ) = g = E(τ)2 E(τ) = 2 and h = V ar(τ) V ar(τ) = 4

47 E(α + 11β) = 74, V ar(α + 11β) = 9, E(α + 5β) = 64, V ar(α + 5β) = 9, Cov(α + 11β, α + 5β) = 0 E(α) + 11 E(β) = 74, E(α) + 5 E(β) = 64, V ar(α) V ar(β) + 22 Cov(α, β) = 9, V ar(α) + 25 V ar(β) + 10 Cov(α, β) = 9, V ar(α) + 55 V ar(β) + 16 Cov(α, β) = 0 ( ) ( ) E(α) 167/3 E(β) = = E(β) 5/3 ( ) V ar(α) Cov(α, β) V ar(β) = = Cov(α, β) V ar(β) Corr(α, β) = ( 36.5 ) = very strong negative

48 β t 2g (b, h g c 1 ) E(β) = b V ar(β) = Therefore ( ) 167/3 b = 5/3 c = 2g 2g 2 h g 1 V ar(β) 1 = 4 Malcolm s conjugate prior α β N 2 Ga τ { b = ( h g c 1 ) = h g 1 c 1 ( ) 167/3, c = 5/3 ( ) = ( ) ( ) } , g = 2, h =

49 Malcolm s data shoe height

50 Data calculations (ˆαˆβ) ( ˆβ = = ( n n i=1 x i n i=1 x n i i=1 x2 i X T X = Malcolm s posterior α { β τ D N 2Ga B = ), ns 2 = , ) = ( ) , C = ( 52 ) ( ) , G = 28, H = }

51 Marginal prior and posterior distributions density density α β density density τ σ

52 95% confidence intervals Confidence regions α: (54.42, 56.98) β: (1.45, 1.75) τ: (0.55, 1.18) HDI (0.57, 1.20) σ: (0.90, 1.31) HDI (0.91, 1.32) Here p = 2 and so the 100(1 α)% confidence region for β = (α, β) T is { β : (β B) T C (β B) < 2H } G F 2,2G;1 α

53 Prior and posterior confidence regions β Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for β = (α, β) T ; 95% (outer), 80% (inner). α

54 Focusing on central part of plot... β Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for β = (α, β) T ; 95% (outer), 80% (inner). α

55 One-way ANOVA example Charlotte is a vet is studying the effect of three new diets (Diets 1, 2 and 3) on the blood coagulation times (in seconds) in a set of animals Model the response Y ij of animal j on diet i as Y ij = µ i + ε ij, indep ε ij N(0, 1/τ) This can be rewritten as a normal linear model Need Charlotte s conjugate prior µ 1 µ 2 µ 3 τ N 3Ga(b, c, g, h)

56 Charlotte s prior knowledge Her previous experience with coagulation times for this sort of animal have a range of around 12 seconds Taking this as a (rough) 95% interval suggests that 4 SD(Y ij ) 12 τ 1/9 She sets the prior mean E(τ) = g/h = 1/9 She thinks it unlikely that σ will be greater than 6 seconds and so requires P r(τ < 1/36) 0.05 She decides on g = 24/9 and h = 24 as this gives P r(τ < 1/36) = 0.053

57 Charlotte s prior knowledge She has no specialist information about any of the diets she decides to treat the diets as exchangeable, so that b 1 r r b = b and c = k r 1 r b r r 1 for some choice of b R, k > 0 and r ( 1/2, 1) She decides that E(µ i ) = b = 60 seconds Corr(µ i, µ j ) = r 1 + r = 2 3 r = 0.4 SD(µ i ) 2.5 seconds k = 4.94 seconds 2

58 Charlotte s conjugate prior µ 1 { µ µ 3 N 3Ga b = 60, c = , τ g = 24/9, h = 24 }

59 The data Diet 1 Diet 2 Diet In fact these data come from Linear Models with R by J.J. Faraway (2004)

60 Rewrite the ANOVA model as a normal linear model Take µ 1 X = and β = µ µ

61 Data calculations ns 2 = 98, X T X = diag(n 1, n 2, n 3 ) = diag(4, 6, 8), ˆµ 1 61 ˆβ = ˆµ 2 = 66 ˆµ 3 61 Charlotte s posterior µ 1 { µ µ 3 D N 3 Ga B = , C = , τ } G = , H =

62 Posterior for treatment means Posterior distribution: µ D t 2G (B, H G C 1 ) Posterior mean E(µ D) = Posterior variance matrix V ar(µ D) = Posterior correlation matrix V ar(µ D) = 0.211

63 Prior and posterior distributions of treatment means density density µ 1 µ 2 density µ 3

64 Prior and posterior distributions for within-treatment group variation density density τ σ

65 95% confidence intervals µ 1 : (59.41, 63.82) µ 2 : (61.86, 65.84) µ 3 : (59.63, 63.28) τ: (0.050, 0.170) HDI (0.055, 0.177) σ: (2.291, 4.130) HDI (2.377, 4.270)

66 Alternative parameterisation of the model Standard model Y ij = µ i + ε ij = β = µ 2 µ 3 Alternative parametrisation Y ij = µ + α i + ε ij = µ β = α 2 α 3 with α 1 0 for identifiability and α 2 = µ 2 µ 1, α 3 = µ 3 µ 1 µ 1

67 Question What is the (joint) posterior distribution of the treatment mean differences α = (α 2, α 3 ) T, where α 2 = µ 2 µ 1, α 3 = µ 3 µ 1? Hints: 1 Posterior distribution: µ D t 2G (B, H G C 1 ) 2 If X t a (b, c), A is a q p matrix and d is a q 1 vector then AX + d t a (Ab + d, AcA T )

68 Hints: 1 Posterior distribution: µ D t 2G (B, H G C 1 ) 2 If X t a (b, c), A is a q p matrix and d is a q 1 vector then AX + d t a (Ab + d, AcA T ) We have ( α2 α 3 ) = i.e. α = Aµ ( ) 1 µ µ 2 µ 3 µ D t 2G ( B, H G C 1) α D t 2G ( AB, H G AC 1 A T )

69 Prior and posterior distributions for mean treatment differences density density α 2 α 3 95% HDIs α 2 : ( 0.373, 4.827) α 3 : ( 2.697, 2.365)

70 α e e Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for α = (α 2, α 3 ) T ; 95% (outer), 80% (inner). α 2

71 Focusing on the central part of the plot... α Figure : 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions for α = (α 2, α 3 ) T ; 95% (outer), 80% (inner). α 2

72 Thanks for listening

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates Chapter 4 Bayesian inference The posterior distribution π(θ x) summarises all our information about θ to date. However, sometimes it is helpful to reduce this distribution to a few key summary measures.

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Bayesian statistics, simulation and software

Bayesian statistics, simulation and software Module 4: Normal model, improper and conjugate priors Department of Mathematical Sciences Aalborg University 1/25 Another example: normal sample with known precision Heights of some Copenhageners in 1995:

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Bayesian statistics, simulation and software

Bayesian statistics, simulation and software Module 10: Bayesian prediction and model checking Department of Mathematical Sciences Aalborg University 1/15 Prior predictions Suppose we want to predict future data x without observing any data x. Assume:

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Predictive Distributions

Predictive Distributions Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Post-exam 2 practice questions 18.05, Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014 Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham NC 778-5 - Revised April,

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics MAS3301 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9 1 15 Inference for Normal Distributions II 15.1 Student s t-distribution When we look

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Statistical Inference: Maximum Likelihood and Bayesian Approaches Statistical Inference: Maximum Likelihood and Bayesian Approaches Surya Tokdar From model to inference So a statistical analysis begins by setting up a model {f (x θ) : θ Θ} for data X. Next we observe

More information

B4 Estimation and Inference

B4 Estimation and Inference B4 Estimation and Inference 6 Lectures Hilary Term 27 2 Tutorial Sheets A. Zisserman Overview Lectures 1 & 2: Introduction sensors, and basics of probability density functions for representing sensor error

More information

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets Confidence sets We consider a sample X from a population indexed by θ Θ R k. We are interested in ϑ, a vector-valued

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

STA 302f16 Assignment Five 1

STA 302f16 Assignment Five 1 STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics MAS3301 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester, 008-9 1 13 Sequential updating 13.1 Theory We have seen how we can change our beliefs about an

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

The Relationship Between the Power Prior and Hierarchical Models

The Relationship Between the Power Prior and Hierarchical Models Bayesian Analysis 006, Number 3, pp. 55 574 The Relationship Between the Power Prior and Hierarchical Models Ming-Hui Chen, and Joseph G. Ibrahim Abstract. The power prior has emerged as a useful informative

More information

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016 Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information