Introduction to Gaussian Processes

Size: px
Start display at page:

Download "Introduction to Gaussian Processes"

Transcription

1 Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013

2 Book Rasmussen and Williams (2006)

3 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions

4 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions

5 The Gaussian Density Perhaps the most common probability density. The Gaussian density. ( ) p(y µ, σ 2 1 ) = exp (y µ)2 2πσ 2 2σ 2 = N ( y µ, σ 2)

6 Gaussian Density 3 p(h µ, σ 2 ) h, height/m The Gaussian PDF with µ = 1.7 and variance σ 2 = Mean shown as red line. It could represent the heights of a population of students.

7 Gaussian Density N ( y µ, σ 2) = ( ) 1 exp (y µ)2 2πσ 2 2σ 2 σ 2 is the variance of the density and µ is the mean.

8 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i

9 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 n i=1 σ 2 i

10 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 (Aside: As sum increases, sum of non-gaussian, finite variance variables is also Gaussian [central limit theorem].) n i=1 σ 2 i

11 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 (Aside: As sum increases, sum of non-gaussian, finite variance variables is also Gaussian [central limit theorem].) n i=1 σ 2 i

12 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian.

13 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2)

14 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) And the scaled density is distributed as wy N ( wµ, w 2 σ 2)

15 Linear Function 2 data points best fit line y, height in m x, weight in kg A linear regression between height and weight.

16 Regression Examples Predict a real value, y i given some inputs x i. Predict quality of meat given spectral measurements (Tecator data). Radiocarbon dating, the C14 calibration curve: predict age given quantity of C14 isotope. Predict quality of different Go or Backgammon moves given expert rated training data.

17 y = mx + c

18 5 4 y = mx + c 3 y x

19 5 4 c y = mx + c 3 y 2 1 m x

20 5 4 c y = mx + c 3 y 2 1 m x

21 5 4 c y = mx + c 3 y 2 1 m x

22 5 4 y = mx + c 3 y x

23 5 4 y = mx + c 3 y x

24 5 4 y = mx + c 3 y x

25 y = mx + c point 1: x = 1, y = 3 3 = m + c point 2: x = 3, y = 1 1 = 3m + c point 3: x = 2, y = = 2m + c

26 y = mx + c + ɛ point 1: x = 1, y = 3 3 = m + c + ɛ 1 point 2: x = 3, y = 1 1 = 3m + c + ɛ 2 point 3: x = 2, y = = 2m + c + ɛ 3

27 Underdetermined System 5 What about two unknowns and one observation? y 1 = mx 1 + c y x

28 Underdetermined System 5 Can compute m given c. m = y 1 c x y x

29 Underdetermined System 5 Can compute m given c. c = 1.75 = m = 1.25 y x

30 Underdetermined System 5 Can compute m given c. c = = m = 3.78 y x

31 Underdetermined System 5 Can compute m given c. c = 4.01 = m = 7.01 y x

32 Underdetermined System 5 Can compute m given c. c = = m = 3.72 y x

33 Underdetermined System 5 Can compute m given c. c = 2.45 = m = y x

34 Underdetermined System 5 Can compute m given c. c = = m = 3.66 y x

35 Underdetermined System 5 Can compute m given c. c = 3.13 = m = 6.13 y x

36 Underdetermined System 5 Can compute m given c. c = 1.47 = m = 4.47 y x

37 Underdetermined System Can compute m given c. Assume c N (0, 4), y we find a distribution of solutions x

38 Probability for Under- and Overdetermined To deal with overdetermined introduced probability distribution for variable, ɛ i. For underdetermined system introduced probability distribution for parameter, c. This is known as a Bayesian treatment.

39 Multivariate Prior Distributions For general Bayesian inference need multivariate priors. E.g. for multivariate linear regression: y i = w j x i,j + ɛ i (where we ve dropped c for convenience), we need a prior over w. This motivates a multivariate Gaussian density. We will use the multivariate Gaussian to put a prior directly on the function (a Gaussian process). i

40 Multivariate Prior Distributions For general Bayesian inference need multivariate priors. E.g. for multivariate linear regression: y i = w x i,: + ɛ i (where we ve dropped c for convenience), we need a prior over w. This motivates a multivariate Gaussian density. We will use the multivariate Gaussian to put a prior directly on the function (a Gaussian process).

41 Prior Distribution Bayesian inference requires a prior on the parameters. The prior represents your belief before you see the data of the likely value of the parameters. For linear regression, consider a Gaussian prior on the intercept: c N (0, α 1 )

42 Gaussian Noise 2 p(c) = N (c 0, α 1 ) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

43 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(y m, c, x, σ 2 ) = N ( y mx + c, σ 2) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

44 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(y m, c, x, σ 2 ) = N ( y mx + c, σ 2) 1 p(c y, m, x, σ 2 ) = y mx N ( c 1+σ 2 /α 1, (σ 2 + α 1 1 ) 1) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.

45 Stages to Derivation of the Posterior Multiply likelihood by prior they are exponentiated quadratics, the answer is always also an exponentiated quadratic because exp(a 2 ) exp(b 2 ) = exp(a 2 + b 2 ). Complete the square to get the resulting density in the form of a Gaussian. Recognise the mean and (co)variance of the Gaussian. This is the estimate of the posterior.

46 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i

47 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i Multivariate regression likelihood: 1 p(y X, w) = (2πσ 2 ) n/2 exp 1 2σ 2 n ( ) yi w 2 x i,: i=1

48 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i Multivariate regression likelihood: 1 p(y X, w) = (2πσ 2 ) n/2 exp 1 2σ 2 n ( ) yi w 2 x i,: i=1 Now use a multivariate Gaussian prior: 1 p(w) = (2πα) p 2 ( exp 1 ) 2α w w

49 Two Dimensional Gaussian Consider height, h/m and weight, w/kg. Could sample height from a distribution: p(h) N (1.7, ) And similarly weight: p(w) N (75, 36)

50 Height and Weight Models p(h) p(w) h/m w/kg Gaussian distributions for height and weight.

51 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

52 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

53 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

54 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

55 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

56 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

57 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

58 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

59 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

60 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

61 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

62 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

63 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

64 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

65 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

66 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

67 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

68 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

69 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

70 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

71 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

72 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

73 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight

74 Independence Assumption This assumes height and weight are independent. p(h, w) = p(h)p(w) In reality they are dependent (body mass index) = w h 2.

75 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

76 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

77 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

78 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

79 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

80 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

81 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

82 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

83 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

84 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

85 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

86 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

87 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

88 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

89 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

90 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

91 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

92 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

93 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

94 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

95 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

96 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

97 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m

98 Independent Gaussians p(w, h) = p(w)p(h)

99 Independent Gaussians p(w, h) = 1 exp 1 (w µ 1 ) 2 2πσ 2 2πσ 2 2 σ (h µ 2) 2 σ 2 2

100 Independent Gaussians 1 p(w, h) = exp 1 2π σ σ2 2 ([ ] w h [ µ1 µ 2 ]) [ ] σ 2 1 ([ ] 0 1 w 0 σ 2 h 2 [ µ1 µ 2])

101 Independent Gaussians p(y) = ( 1 2π D exp 1 ) 2 (y µ) D 1 (y µ)

102 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 2 1 ( exp 1 ) 2 (y µ) D 1 (y µ)

103 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 1 2 ( exp 1 ) 2 (R y R µ) D 1 (R y R µ)

104 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 1 2 this gives a covariance matrix: ( exp 1 ) 2 (y µ) RD 1 R (y µ) C 1 = RD 1 R

105 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π C 2 1 this gives a covariance matrix: ( exp 1 ) 2 (y µ) C 1 (y µ) C = RDR

106 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i

107 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 σ 2 i

108 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. σ 2 i

109 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) σ 2 i

110 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) σ 2 i wy N ( wµ, w 2 σ 2)

111 Multivariate Consequence If x N ( µ, Σ )

112 Multivariate Consequence If x N ( µ, Σ ) And y = Wx

113 Multivariate Consequence If x N ( µ, Σ ) And y = Wx Then y N ( Wµ, WΣW )

114 Sampling a Function Multi-variate Gaussians We will consider a Gaussian with a particular structure of covariance matrix. Generate a single sample from this 25 dimensional Gaussian distribution, f = [ f 1, f 2... f 25 ]. We will plot these points against their index.

115 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) j 0 (b) colormap ishowing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

116 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) j 0 (b) colormap ishowing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

117 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

118 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

119 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

120 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

121 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.

122 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) correlation between f 1 and f 2. Figure: A sample from a 25 dimensional Gaussian distribution.

123 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ).

124 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 =

125 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 = Conditional density: p( f 2 f 1 = 0.313).

126 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 = Conditional density: p( f 2 f 1 = 0.313).

127 Prediction with Correlated Gaussians Prediction of f 2 from f 1 requires conditional density. Conditional density is also Gaussian. p( f 2 f 1 ) = N f 2 k 1,2 f 1, k 2,2 k2 k 1,1 1,2 k 1,1 where covariance of joint density is given by [ ] k1,1 k K = 1,2 k 2,1 k 2,2

128 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ).

129 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 =

130 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 = Conditional density: p( f 5 f 1 = 0.313).

131 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 = Conditional density: p( f 5 f 1 = 0.313).

132 Prediction with Correlated Gaussians Prediction of f from f requires multivariate conditional density. Multivariate conditional density is also Gaussian. p(f f) = N ( f K,f K 1 f,f f, K, K,f K 1 f,f K ) f, Here covariance of joint density is given by [ ] Kf,f K K =,f K f, K,

133 Prediction with Correlated Gaussians Prediction of f from f requires multivariate conditional density. Multivariate conditional density is also Gaussian. p(f f) = N ( f µ, Σ ) µ = K,f K 1 f,f f Σ = K, K,f K 1 f,f K f, Here covariance of joint density is given by K = [ ] Kf,f K,f K f, K,

134 Covariance Functions Where did this covariance matrix come from? Exponentiated Quadratic Kernel Function (RBF, Squared Exponential, Gaussian) k (x, x ) = α exp x x 2 2 2l 2 Covariance matrix is built using the inputs to the function x. For the example above it was based on Euclidean distance. The covariance function is also know as a kernel.

135 Covariance Functions Where did this covariance matrix come from? Exponentiated Quadratic Kernel Function (RBF, Squared Exponential, Gaussian) k (x, x ) = α exp x x 2 2 2l 2 Covariance matrix is built using the inputs to the function x. For the example above it was based on Euclidean distance. The covariance function is also know as a kernel

136 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = 3.0 k 1,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

137 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = k 1,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

138 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

139 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 1.00 x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

140 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

141 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

142 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

143 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

144 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

145 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

146 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

147 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

148 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

149 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

150 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

151 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.

152 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3, x 1 = 3 k 1,1 = 1.0 exp ( ) ( 3 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

153 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3, x 1 = k 1,1 = 1.0 exp ( ) ( 3 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

154 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

155 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 1.0 x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

156 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

157 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 2 = k 2,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

158 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 2 = k 2,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

159 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

160 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = 3 k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

161 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = 3 k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

162 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

163 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

164 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

165 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 3 = 1.4 k 3,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

166 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 3 = 1.4 k 3,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

167 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

168 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

169 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

170 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

171 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

172 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

173 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

174 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

175 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

176 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

177 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

178 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.

179 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = 3.0 k 1,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

180 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = k 1,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

181 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

182 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 4.00 x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

183 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

184 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

185 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

186 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

187 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

188 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

189 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

190 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

191 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

192 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

193 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

194 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.

195 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions

196 Basis Function Form Radial basis functions commonly have the form 2 φ k (x i ) = exp x i µ k 2l 2. 1 Basis function maps data into a feature space in which a linear sum is a non linear function. φ(x) x Figure: A set of radial basis functions with width l = 2 and location parameters µ = [ 4 0 4].

197 Basis Function Representations Represent a function by a linear sum over a basis, f (x i,: ; w) = m w k φ k (x i,: ), (1) k=1 Here: m basis functions and φ k ( ) is kth basis function and w = [w 1,..., w m ]. For standard linear model: φ k (x i,: ) = x i,k.

198 Random Functions Functions derived using: f (x) = m w k φ k (x), k=1 where W is sampled from a Gaussian density, w k N (0, α). f (x) x Figure: Functions sampled using the basis set from figure 3. Each line is a separate sample, generated by a weighted sum of the basis set. The weights, w are sampled from a Gaussian density with variance α = 1.

199 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1

200 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw.

201 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product.

202 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix

203 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix Φ is fixed and non-stochastic for a given training set.

204 Direct Construction of Covariance Matrix Use matrix notation to write function, m f (x i ; w) = w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix Φ is fixed and non-stochastic for a given training set. f is Gaussian distributed.

205 Expectations We have f = Φ w. We use to denote expectations under prior distributions.

206 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. We use to denote expectations under prior distributions.

207 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. Prior covariance of f is K = ff f f We use to denote expectations under prior distributions.

208 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. Prior covariance of f is K = ff f f ff = Φ ww Φ, giving K = γ ΦΦ. We use to denote expectations under prior distributions.

209 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ),

210 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l

211 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l For the radial basis used this gives

212 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l For the radial basis used this gives k ( x i, x j ) = γ m 2 xj 2 exp x i µ k + µ k 2l 2. k=1

213 Covariance Functions RBF Basis Functions k (x, x ) = αφ(x) φ(x ) 2 φ i (x) = exp x µ i 2 1 µ = 0 1 l 2

214 Covariance Functions RBF Basis Functions k (x, x ) = αφ(x) φ(x ) 2 φ i (x) = exp x µ i l µ =

215 Selecting Number and Location of Basis Need to choose 1. location of centers

216 Selecting Number and Location of Basis Need to choose 1. location of centers 2. number of basis functions

217 Selecting Number and Location of Basis Need to choose 1. location of centers 2. number of basis functions Consider uniform spacing over a region: k ( ) m x 2 + x 2 x i, x j = γ exp i j 2µ ( ) k xi + x j + 2µ 2 k 2l 2, k=1 Restrict analysis to 1-D input, x.

218 Uniform Basis Functions Set each center location to µ k = a + µ (k 1).

219 Uniform Basis Functions Set each center location to µ k = a + µ (k 1). Specify the basis functions in terms of their indices, k ( ) m 1 ( x 2 + x 2 i j x i, x j =γ µ exp 2l 2 k=0 2 ( a + µ k ) ( ) ( ) x i + x j + 2 a + µ k 2 ) 2l 2.

220 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1).

221 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1). Take limit as µ 0 so m

222 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1). Take limit as µ 0 so m b ( x 2 + x 2 i j k(x i, x j ) =γ exp a 2l ( ( )) µ 1 2 ( ) 2 xi + x j 1 2 ) 2 xi + x j 2l 2 dµ, where we have used k µ µ.

223 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l,

224 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l, Now take limit as a and b

225 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l, Now take limit as a and b ( ) 2 xi x j k ( ) x i, x j = α exp 4l 2. where α = γ πl 2.

226 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process.

227 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process. The covariance function is given by the exponentiated quadratic covariance function. ( ) 2 xi x j k ( ) x i, x j = α exp 4l 2. where α = γ πl 2.

228 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process. The covariance function is the exponentiated quadratic. Note: The functional form for the covariance function and basis functions are similar. this is a special case, in general they are very different Similar results can obtained for multi-dimensional input models Williams (1998); Neal (1996).

229 References I G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of the trp63 transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Research, 18(6): , Jun [URL]. [DOI]. A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression. BMC Bioinformatics, 12(180), [DOI]. R. M. Neal. Bayesian Learning for Neural Networks. Springer, Lecture Notes in Statistics 118. J. Oakley and A. O Hagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4): , C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, [Google Books]. C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5): , 1998.

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Non Linear Latent Variable Models

Non Linear Latent Variable Models Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Joint Emotion Analysis via Multi-task Gaussian Processes

Joint Emotion Analysis via Multi-task Gaussian Processes Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions

More information

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Latent Variable Models with Gaussian Processes

Latent Variable Models with Gaussian Processes Latent Variable Models with Gaussian Processes Neil D. Lawrence GP Master Class 6th February 2017 Outline Motivating Example Linear Dimensionality Reduction Non-linear Dimensionality Reduction Outline

More information

Inferring Latent Functions with Gaussian Processes in Differential Equations

Inferring Latent Functions with Gaussian Processes in Differential Equations Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

Modeling human function learning with Gaussian processes

Modeling human function learning with Gaussian processes Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Kernels for Automatic Pattern Discovery and Extrapolation

Kernels for Automatic Pattern Discovery and Extrapolation Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0. Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

Dropout as a Bayesian Approximation: Insights and Applications

Dropout as a Bayesian Approximation: Insights and Applications Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information