Introduction to Gaussian Processes

Size: px

Start display at page:

Download "Introduction to Gaussian Processes"

Sydney Davis
5 years ago
Views:

1 Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013

2 Book Rasmussen and Williams (2006)

3 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions

4 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions

5 The Gaussian Density Perhaps the most common probability density. The Gaussian density. ( ) p(y µ, σ 2 1 ) = exp (y µ)2 2πσ 2 2σ 2 = N ( y µ, σ 2)

6 Gaussian Density 3 p(h µ, σ 2 ) h, height/m The Gaussian PDF with µ = 1.7 and variance σ 2 = Mean shown as red line. It could represent the heights of a population of students.

7 Gaussian Density N ( y µ, σ 2) = ( ) 1 exp (y µ)2 2πσ 2 2σ 2 σ 2 is the variance of the density and µ is the mean.

8 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i

9 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 n i=1 σ 2 i

10 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 (Aside: As sum increases, sum of non-gaussian, finite variance variables is also Gaussian [central limit theorem].) n i=1 σ 2 i

11 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 (Aside: As sum increases, sum of non-gaussian, finite variance variables is also Gaussian [central limit theorem].) n i=1 σ 2 i

12 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian.

13 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2)

14 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) And the scaled density is distributed as wy N ( wµ, w 2 σ 2)

15 Linear Function 2 data points best fit line y, height in m x, weight in kg A linear regression between height and weight.

16 Regression Examples Predict a real value, y i given some inputs x i. Predict quality of meat given spectral measurements (Tecator data). Radiocarbon dating, the C14 calibration curve: predict age given quantity of C14 isotope. Predict quality of different Go or Backgammon moves given expert rated training data.

17 y = mx + c

18 5 4 y = mx + c 3 y x

19 5 4 c y = mx + c 3 y 2 1 m x

20 5 4 c y = mx + c 3 y 2 1 m x

21 5 4 c y = mx + c 3 y 2 1 m x

22 5 4 y = mx + c 3 y x

23 5 4 y = mx + c 3 y x

24 5 4 y = mx + c 3 y x

25 y = mx + c point 1: x = 1, y = 3 3 = m + c point 2: x = 3, y = 1 1 = 3m + c point 3: x = 2, y = = 2m + c

26 y = mx + c + ɛ point 1: x = 1, y = 3 3 = m + c + ɛ 1 point 2: x = 3, y = 1 1 = 3m + c + ɛ 2 point 3: x = 2, y = = 2m + c + ɛ 3

27 Underdetermined System 5 What about two unknowns and one observation? y 1 = mx 1 + c y x

28 Underdetermined System 5 Can compute m given c. m = y 1 c x y x

29 Underdetermined System 5 Can compute m given c. c = 1.75 = m = 1.25 y x

30 Underdetermined System 5 Can compute m given c. c = = m = 3.78 y x

31 Underdetermined System 5 Can compute m given c. c = 4.01 = m = 7.01 y x

32 Underdetermined System 5 Can compute m given c. c = = m = 3.72 y x

33 Underdetermined System 5 Can compute m given c. c = 2.45 = m = y x

34 Underdetermined System 5 Can compute m given c. c = = m = 3.66 y x

35 Underdetermined System 5 Can compute m given c. c = 3.13 = m = 6.13 y x

36 Underdetermined System 5 Can compute m given c. c = 1.47 = m = 4.47 y x

37 Underdetermined System Can compute m given c. Assume c N (0, 4), y we find a distribution of solutions x

38 Probability for Under- and Overdetermined To deal with overdetermined introduced probability distribution for variable, ɛ i. For underdetermined system introduced probability distribution for parameter, c. This is known as a Bayesian treatment.

39 Multivariate Prior Distributions For general Bayesian inference need multivariate priors. E.g. for multivariate linear regression: y i = w j x i,j + ɛ i (where we ve dropped c for convenience), we need a prior over w. This motivates a multivariate Gaussian density. We will use the multivariate Gaussian to put a prior directly on the function (a Gaussian process). i

40 Multivariate Prior Distributions For general Bayesian inference need multivariate priors. E.g. for multivariate linear regression: y i = w x i,: + ɛ i (where we ve dropped c for convenience), we need a prior over w. This motivates a multivariate Gaussian density. We will use the multivariate Gaussian to put a prior directly on the function (a Gaussian process).

41 Prior Distribution Bayesian inference requires a prior on the parameters. The prior represents your belief before you see the data of the likely value of the parameters. For linear regression, consider a Gaussian prior on the intercept: c N (0, α 1 )

42 Gaussian Noise 2 p(c) = N (c 0, α 1 ) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

43 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(y m, c, x, σ 2 ) = N ( y mx + c, σ 2) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

44 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(y m, c, x, σ 2 ) = N ( y mx + c, σ 2) 1 p(c y, m, x, σ 2 ) = y mx N ( c 1+σ 2 /α 1, (σ 2 + α 1 1 ) 1) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

45 Stages to Derivation of the Posterior Multiply likelihood by prior they are exponentiated quadratics, the answer is always also an exponentiated quadratic because exp(a 2 ) exp(b 2 ) = exp(a 2 + b 2 ). Complete the square to get the resulting density in the form of a Gaussian. Recognise the mean and (co)variance of the Gaussian. This is the estimate of the posterior.

46 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i

47 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i Multivariate regression likelihood: 1 p(y X, w) = (2πσ 2 ) n/2 exp 1 2σ 2 n ( ) yi w 2 x i,: i=1

48 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i Multivariate regression likelihood: 1 p(y X, w) = (2πσ 2 ) n/2 exp 1 2σ 2 n ( ) yi w 2 x i,: i=1 Now use a multivariate Gaussian prior: 1 p(w) = (2πα) p 2 ( exp 1 ) 2α w w

49 Two Dimensional Gaussian Consider height, h/m and weight, w/kg. Could sample height from a distribution: p(h) N (1.7, ) And similarly weight: p(w) N (75, 36)

50 Height and Weight Models p(h) p(w) h/m w/kg Gaussian distributions for height and weight.

51 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

52 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

53 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

54 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

55 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

56 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

57 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

58 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

59 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

60 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

61 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

62 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

63 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

64 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

65 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

66 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

67 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

68 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

69 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

70 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

71 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

72 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

73 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

74 Independence Assumption This assumes height and weight are independent. p(h, w) = p(h)p(w) In reality they are dependent (body mass index) = w h 2.

75 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

76 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

77 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

78 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

79 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

80 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

81 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

82 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

83 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

84 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

85 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

86 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

87 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

88 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

89 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

90 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

91 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

92 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

93 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

94 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

95 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

96 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

97 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

98 Independent Gaussians p(w, h) = p(w)p(h)

99 Independent Gaussians p(w, h) = 1 exp 1 (w µ 1 ) 2 2πσ 2 2πσ 2 2 σ (h µ 2) 2 σ 2 2

100 Independent Gaussians 1 p(w, h) = exp 1 2π σ σ2 2 ([ ] w h [ µ1 µ 2 ]) [ ] σ 2 1 ([ ] 0 1 w 0 σ 2 h 2 [ µ1 µ 2])

101 Independent Gaussians p(y) = ( 1 2π D exp 1 ) 2 (y µ) D 1 (y µ)

102 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 2 1 ( exp 1 ) 2 (y µ) D 1 (y µ)

103 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 1 2 ( exp 1 ) 2 (R y R µ) D 1 (R y R µ)

104 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 1 2 this gives a covariance matrix: ( exp 1 ) 2 (y µ) RD 1 R (y µ) C 1 = RD 1 R

105 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π C 2 1 this gives a covariance matrix: ( exp 1 ) 2 (y µ) C 1 (y µ) C = RDR

106 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i

107 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 σ 2 i

108 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. σ 2 i

109 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) σ 2 i

110 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) σ 2 i wy N ( wµ, w 2 σ 2)

111 Multivariate Consequence If x N ( µ, Σ )

112 Multivariate Consequence If x N ( µ, Σ ) And y = Wx

113 Multivariate Consequence If x N ( µ, Σ ) And y = Wx Then y N ( Wµ, WΣW )

114 Sampling a Function Multi-variate Gaussians We will consider a Gaussian with a particular structure of covariance matrix. Generate a single sample from this 25 dimensional Gaussian distribution, f = [ f 1, f 2... f 25 ]. We will plot these points against their index.

115 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) j 0 (b) colormap ishowing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

116 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) j 0 (b) colormap ishowing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

117 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

118 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

119 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

120 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

121 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

122 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) correlation between f 1 and f 2. Figure: A sample from a 25 dimensional Gaussian distribution.

123 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ).

124 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 =

125 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 = Conditional density: p( f 2 f 1 = 0.313).

126 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 = Conditional density: p( f 2 f 1 = 0.313).

127 Prediction with Correlated Gaussians Prediction of f 2 from f 1 requires conditional density. Conditional density is also Gaussian. p( f 2 f 1 ) = N f 2 k 1,2 f 1, k 2,2 k2 k 1,1 1,2 k 1,1 where covariance of joint density is given by [ ] k1,1 k K = 1,2 k 2,1 k 2,2

128 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ).

129 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 =

130 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 = Conditional density: p( f 5 f 1 = 0.313).

131 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 = Conditional density: p( f 5 f 1 = 0.313).

132 Prediction with Correlated Gaussians Prediction of f from f requires multivariate conditional density. Multivariate conditional density is also Gaussian. p(f f) = N ( f K,f K 1 f,f f, K, K,f K 1 f,f K ) f, Here covariance of joint density is given by [ ] Kf,f K K =,f K f, K,

133 Prediction with Correlated Gaussians Prediction of f from f requires multivariate conditional density. Multivariate conditional density is also Gaussian. p(f f) = N ( f µ, Σ ) µ = K,f K 1 f,f f Σ = K, K,f K 1 f,f K f, Here covariance of joint density is given by K = [ ] Kf,f K,f K f, K,

134 Covariance Functions Where did this covariance matrix come from? Exponentiated Quadratic Kernel Function (RBF, Squared Exponential, Gaussian) k (x, x ) = α exp x x 2 2 2l 2 Covariance matrix is built using the inputs to the function x. For the example above it was based on Euclidean distance. The covariance function is also know as a kernel.

135 Covariance Functions Where did this covariance matrix come from? Exponentiated Quadratic Kernel Function (RBF, Squared Exponential, Gaussian) k (x, x ) = α exp x x 2 2 2l 2 Covariance matrix is built using the inputs to the function x. For the example above it was based on Euclidean distance. The covariance function is also know as a kernel

136 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = 3.0 k 1,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

137 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = k 1,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

138 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

139 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 1.00 x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

140 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

141 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

142 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

143 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

144 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

145 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

146 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

147 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

148 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

149 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

150 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

151 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

152 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3, x 1 = 3 k 1,1 = 1.0 exp ( ) ( 3 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

153 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3, x 1 = k 1,1 = 1.0 exp ( ) ( 3 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

154 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

155 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 1.0 x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

156 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

157 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 2 = k 2,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

158 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 2 = k 2,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

159 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

160 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = 3 k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

161 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = 3 k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

162 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

163 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

164 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

165 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 3 = 1.4 k 3,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

166 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 3 = 1.4 k 3,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

167 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

168 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

169 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

170 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

171 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

172 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

173 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

174 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

175 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

176 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

177 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

178 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

179 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = 3.0 k 1,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

180 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = k 1,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

181 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

182 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 4.00 x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

183 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

184 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

185 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

186 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

187 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

188 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

189 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

190 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

191 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

192 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

193 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

194 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

195 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions

196 Basis Function Form Radial basis functions commonly have the form 2 φ k (x i ) = exp x i µ k 2l 2. 1 Basis function maps data into a feature space in which a linear sum is a non linear function. φ(x) x Figure: A set of radial basis functions with width l = 2 and location parameters µ = [ 4 0 4].

197 Basis Function Representations Represent a function by a linear sum over a basis, f (x i,: ; w) = m w k φ k (x i,: ), (1) k=1 Here: m basis functions and φ k ( ) is kth basis function and w = [w 1,..., w m ]. For standard linear model: φ k (x i,: ) = x i,k.

198 Random Functions Functions derived using: f (x) = m w k φ k (x), k=1 where W is sampled from a Gaussian density, w k N (0, α). f (x) x Figure: Functions sampled using the basis set from figure 3. Each line is a separate sample, generated by a weighted sum of the basis set. The weights, w are sampled from a Gaussian density with variance α = 1.

199 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1

200 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw.

201 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product.

202 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix

203 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix Φ is fixed and non-stochastic for a given training set.

204 Direct Construction of Covariance Matrix Use matrix notation to write function, m f (x i ; w) = w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix Φ is fixed and non-stochastic for a given training set. f is Gaussian distributed.

205 Expectations We have f = Φ w. We use to denote expectations under prior distributions.

206 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. We use to denote expectations under prior distributions.

207 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. Prior covariance of f is K = ff f f We use to denote expectations under prior distributions.

208 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. Prior covariance of f is K = ff f f ff = Φ ww Φ, giving K = γ ΦΦ. We use to denote expectations under prior distributions.

209 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ),

210 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l

211 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l For the radial basis used this gives

212 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l For the radial basis used this gives k ( x i, x j ) = γ m 2 xj 2 exp x i µ k + µ k 2l 2. k=1

213 Covariance Functions RBF Basis Functions k (x, x ) = αφ(x) φ(x ) 2 φ i (x) = exp x µ i 2 1 µ = 0 1 l 2

214 Covariance Functions RBF Basis Functions k (x, x ) = αφ(x) φ(x ) 2 φ i (x) = exp x µ i l µ =

215 Selecting Number and Location of Basis Need to choose 1. location of centers

216 Selecting Number and Location of Basis Need to choose 1. location of centers 2. number of basis functions

217 Selecting Number and Location of Basis Need to choose 1. location of centers 2. number of basis functions Consider uniform spacing over a region: k ( ) m x 2 + x 2 x i, x j = γ exp i j 2µ ( ) k xi + x j + 2µ 2 k 2l 2, k=1 Restrict analysis to 1-D input, x.

218 Uniform Basis Functions Set each center location to µ k = a + µ (k 1).

219 Uniform Basis Functions Set each center location to µ k = a + µ (k 1). Specify the basis functions in terms of their indices, k ( ) m 1 ( x 2 + x 2 i j x i, x j =γ µ exp 2l 2 k=0 2 ( a + µ k ) ( ) ( ) x i + x j + 2 a + µ k 2 ) 2l 2.

220 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1).

221 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1). Take limit as µ 0 so m

222 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1). Take limit as µ 0 so m b ( x 2 + x 2 i j k(x i, x j ) =γ exp a 2l ( ( )) µ 1 2 ( ) 2 xi + x j 1 2 ) 2 xi + x j 2l 2 dµ, where we have used k µ µ.

223 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l,

224 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l, Now take limit as a and b

225 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l, Now take limit as a and b ( ) 2 xi x j k ( ) x i, x j = α exp 4l 2. where α = γ πl 2.

226 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process.

227 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process. The covariance function is given by the exponentiated quadratic covariance function. ( ) 2 xi x j k ( ) x i, x j = α exp 4l 2. where α = γ πl 2.

228 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process. The covariance function is the exponentiated quadratic. Note: The functional form for the covariance function and basis functions are similar. this is a special case, in general they are very different Similar results can obtained for multi-dimensional input models Williams (1998); Neal (1996).

229 References I G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of the trp63 transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Research, 18(6): , Jun [URL]. [DOI]. A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression. BMC Bioinformatics, 12(180), [DOI]. R. M. Neal. Bayesian Learning for Neural Networks. Springer, Lecture Notes in Statistics 118. J. Oakley and A. O Hagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4): , C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, [Google Books]. C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5): , 1998.

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate