Multivariate Bayesian Linear Regression MLAI Lecture 11

Size: px

Start display at page:

Download "Multivariate Bayesian Linear Regression MLAI Lecture 11"

Stuart Edwards
5 years ago
Views:

1 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012

2 Outline Univariate Bayesian Linear Regression Multivariate Bayesian Linear Regression

3 Prior Distribution Bayesian inference requires a prior on the parameters. The prior represents your belief before you see the data of the likely value of the parameters. For linear regression, consider a Gaussian prior on the intercept: c N (0, α 1 )

4 Gaussian Noise 2 p(c) = N (c 0, α 1 ) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

5 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(t m, c, x, σ 2 ) = N ( t mx + c, σ 2) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

6 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(t m, c, x, σ 2 ) = N ( t mx + c, σ 2) 1 ( p(c t, m, x, σ 2 ) = ) t mx N c 1+σ 2 /α 1, (σ 2 + α 1 1 ) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

7 Stages to Derivation of the Posterior Multiply likelihood by prior they are exponentiated quadratics, the answer is always also an exponentiated quadratic because exp(a 2 ) exp(b 2 ) = exp(a 2 + b 2 ). Complete the square to get the resulting density in the form of a Gaussian. Recognise the mean and (co)variance of the Gaussian. This is the estimate of the posterior.

8 Main Trick p(c) = ( 1 exp 1 ) c 2 2πα1 2α 1 ( ) p(t x, c, m, σ 2 1 ) = exp 1 N (2πσ 2 ) N 2 2σ 2 (t i mx i c) 2 i=1

9 Main Trick p(c) = ( 1 exp 1 ) c 2 2πα1 2α 1 ( ) p(t x, c, m, σ 2 1 ) = exp 1 N (2πσ 2 ) N 2 2σ 2 (t i mx i c) 2 i=1 p(c t, x, m, σ 2 ) = p(t x, c, m, σ2 )p(c) p(t x, m, σ 2 )

10 Main Trick p(c) = ( 1 exp 1 ) c 2 2πα1 2α 1 ( ) p(t x, c, m, σ 2 1 ) = exp 1 N (2πσ 2 ) N 2 2σ 2 (t i mx i c) 2 p(c t, x, m, σ 2 ) = i=1 p(t x, c, m, σ2 )p(c) p(t x, c, m, σ 2 )p(c)dc

11 Main Trick p(c) = ( 1 exp 1 ) c 2 2πα1 2α 1 ( ) p(t x, c, m, σ 2 1 ) = exp 1 N (2πσ 2 ) N 2 2σ 2 (t i mx i c) 2 i=1 p(c t, x, m, σ 2 ) p(t x, c, m, σ 2 )p(c)

12 log p(c t, x, m, σ 2 ) = 1 2σ 2 N (t i c mx i ) 2 1 c 2 + const 2α 1 i=1 = 1 N 2σ 2 (t i mx i ) 2 i=1 N i=1 + c (t i mx i ) σ 2, complete the square of the quadratic form to obtain ( N 2σ ) c 2 2α 1 log p(c t, x, m, σ 2 ) = 1 2τ 2 (c µ)2 + const, where τ 2 = ( Nσ 2 + α1 1 ) 1 and µ = τ 2 N σ 2 n=1 (t i mx i ).

13 The Joint Density Really want to know the joint posterior density over the parameters c and m. Could now integrate out over m, but it s easier to consider the multivariate case.

14 Two Dimensional Gaussian Consider height, and weight,. Could sample height from a distribution: N (1.7, ) And similarly weight: N (75, 36)

15 Height and Weight Models Gaussian distributions for height and weight.

16 Sample height and weight one after the other and plot against each other.

17 Sample height and weight one after the other and plot against each other.

18 Sample height and weight one after the other and plot against each other.

19 Sample height and weight one after the other and plot against each other.

20 Sample height and weight one after the other and plot against each other.

21 Sample height and weight one after the other and plot against each other.

22 Sample height and weight one after the other and plot against each other.

23 Sample height and weight one after the other and plot against each other.

24 Sample height and weight one after the other and plot against each other.

25 Sample height and weight one after the other and plot against each other.

26 Sample height and weight one after the other and plot against each other.

27 Sample height and weight one after the other and plot against each other.

28 Sample height and weight one after the other and plot against each other.

29 Sample height and weight one after the other and plot against each other.

30 Sample height and weight one after the other and plot against each other.

31 Sample height and weight one after the other and plot against each other.

32 Sample height and weight one after the other and plot against each other.

33 Sample height and weight one after the other and plot against each other.

34 Sample height and weight one after the other and plot against each other.

35 Sample height and weight one after the other and plot against each other.

36 Sample height and weight one after the other and plot against each other.

37 Sample height and weight one after the other and plot against each other.

38 Sample height and weight one after the other and plot against each other.

39 Independence Assumption This assumes height and weight are independent. p(h, w) = In reality they are dependent (body mass index) = w h 2.

63 Independent Gaussians p(w, h) =

64 Independent Gaussians p(w, h) = 1 2πσ1 2 2πσ 2 2 ( exp 1 ( (w µ1 ) 2 2 σ1 2 + (h µ 2) 2 )) σ2 2

65 Independent Gaussians p(w, h) = 1 2π σ 2 1 σ2 2 exp ( 1 2 ([ ] w h [ µ1 µ 2 ]) [ σ σ 2 2 ] 1 ([ ] [ w µ h µ

66 Independent Gaussians p(t) = 1 ( 2π D exp 1 ) 2 (t µ) D 1 (t µ)

67 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. ( 1 p(t) = exp 1 ) 2π D (t µ) D 1 (t µ)

68 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. p(t) = 1 2π D 1 2 ( exp 1 ) 2 (R t R µ) D 1 (R t R µ)

69 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. ( 1 p(t) = exp 1 ) 2π D (t µ) RD 1 R (t µ) this gives a covariance matrix: C 1 = RD 1 R

70 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. ( 1 p(t) = exp 1 ) 2π C (t µ) C 1 (t µ) this gives a covariance matrix: C = RDR

71 Outline Univariate Bayesian Linear Regression Multivariate Bayesian Linear Regression

72 Multivariate Regression Likelihood Recall multivariate regression likelihood: ( ) 1 p(t X, w) = exp 1 N ( ) 2 (2πσ 2 N/2 ) 2σ 2 t i w x i,: i=1 Now use a multivariate Gaussian prior: ( 1 = exp 1 ) (2πα) p 2 2α w w

73 Multivariate Regression Likelihood Recall multivariate regression likelihood: ( ) 1 p(t X, w) = exp 1 N ( ) 2 (2πσ 2 N/2 ) 2σ 2 t i w x i,: i=1 Now use a multivariate Gaussian prior: ( 1 = exp 1 ) (2πα) p 2 2α w w

74 Posterior Density Once again we want to know the posterior: p(w t, X) p(t X, w) And we can compute by completing the square. log p(w t, X) = 1 2σ 2 1 2σ 2 N ti σ 2 i=1 N i=1 N i=1 t i x i,:w w x i,: x i,:w 1 2α w w + const. p(w t, X) = N (w µ w, C w ) C w = (σ 2 X X + α 1 ) 1 and µ w = C w σ 2 X t

75 Posterior Density Once again we want to know the posterior: p(w t, X) p(t X, w) And we can compute by completing the square. log p(w t, X) = 1 2σ 2 1 2σ 2 N ti σ 2 i=1 N i=1 N i=1 t i x i,:w w x i,: x i,:w 1 2α w w + const. p(w t, X) = N (w µ w, C w ) C w = (σ 2 X X + α 1 ) 1 and µ w = C w σ 2 X t

76 Bayesian vs Maximum Likelihood Note the similarity between posterior mean µ w = (σ 2 X X + α 1 ) 1 σ 2 X t and Maximum likelihood solution ŵ = (X X) 1 X t

77 Marginal Likelihood is Computed as Normalizer p(w t, X)p(t X) = p(t w, X)

78 Marginal Likelihood Can compute the marginal likelihood as: ( ) p(t X, α, σ) = N t 0, αxx + σ 2 I

79 Reading Section 2.3 of Bishop up to top of pg 85 (multivariate Gaussians). Section 3.3 of Bishop up to 159 (pg ).

80 References I C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag, [Google Books].

Introduction to Gaussian Processes

Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing