Introduction to Gaussian Processes
|
|
- Sydney Davis
- 5 years ago
- Views:
Transcription
1 Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013
2 Book Rasmussen and Williams (2006)
3 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions
4 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions
5 The Gaussian Density Perhaps the most common probability density. The Gaussian density. ( ) p(y µ, σ 2 1 ) = exp (y µ)2 2πσ 2 2σ 2 = N ( y µ, σ 2)
6 Gaussian Density 3 p(h µ, σ 2 ) h, height/m The Gaussian PDF with µ = 1.7 and variance σ 2 = Mean shown as red line. It could represent the heights of a population of students.
7 Gaussian Density N ( y µ, σ 2) = ( ) 1 exp (y µ)2 2πσ 2 2σ 2 σ 2 is the variance of the density and µ is the mean.
8 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i
9 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 n i=1 σ 2 i
10 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 (Aside: As sum increases, sum of non-gaussian, finite variance variables is also Gaussian [central limit theorem].) n i=1 σ 2 i
11 Two Important Gaussian Properties Sum of Gaussians Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i And the sum is distributed as n n y i N µ i, i=1 i=1 (Aside: As sum increases, sum of non-gaussian, finite variance variables is also Gaussian [central limit theorem].) n i=1 σ 2 i
12 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian.
13 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2)
14 Two Important Gaussian Properties Scaling a Gaussian Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) And the scaled density is distributed as wy N ( wµ, w 2 σ 2)
15 Linear Function 2 data points best fit line y, height in m x, weight in kg A linear regression between height and weight.
16 Regression Examples Predict a real value, y i given some inputs x i. Predict quality of meat given spectral measurements (Tecator data). Radiocarbon dating, the C14 calibration curve: predict age given quantity of C14 isotope. Predict quality of different Go or Backgammon moves given expert rated training data.
17 y = mx + c
18 5 4 y = mx + c 3 y x
19 5 4 c y = mx + c 3 y 2 1 m x
20 5 4 c y = mx + c 3 y 2 1 m x
21 5 4 c y = mx + c 3 y 2 1 m x
22 5 4 y = mx + c 3 y x
23 5 4 y = mx + c 3 y x
24 5 4 y = mx + c 3 y x
25 y = mx + c point 1: x = 1, y = 3 3 = m + c point 2: x = 3, y = 1 1 = 3m + c point 3: x = 2, y = = 2m + c
26 y = mx + c + ɛ point 1: x = 1, y = 3 3 = m + c + ɛ 1 point 2: x = 3, y = 1 1 = 3m + c + ɛ 2 point 3: x = 2, y = = 2m + c + ɛ 3
27 Underdetermined System 5 What about two unknowns and one observation? y 1 = mx 1 + c y x
28 Underdetermined System 5 Can compute m given c. m = y 1 c x y x
29 Underdetermined System 5 Can compute m given c. c = 1.75 = m = 1.25 y x
30 Underdetermined System 5 Can compute m given c. c = = m = 3.78 y x
31 Underdetermined System 5 Can compute m given c. c = 4.01 = m = 7.01 y x
32 Underdetermined System 5 Can compute m given c. c = = m = 3.72 y x
33 Underdetermined System 5 Can compute m given c. c = 2.45 = m = y x
34 Underdetermined System 5 Can compute m given c. c = = m = 3.66 y x
35 Underdetermined System 5 Can compute m given c. c = 3.13 = m = 6.13 y x
36 Underdetermined System 5 Can compute m given c. c = 1.47 = m = 4.47 y x
37 Underdetermined System Can compute m given c. Assume c N (0, 4), y we find a distribution of solutions x
38 Probability for Under- and Overdetermined To deal with overdetermined introduced probability distribution for variable, ɛ i. For underdetermined system introduced probability distribution for parameter, c. This is known as a Bayesian treatment.
39 Multivariate Prior Distributions For general Bayesian inference need multivariate priors. E.g. for multivariate linear regression: y i = w j x i,j + ɛ i (where we ve dropped c for convenience), we need a prior over w. This motivates a multivariate Gaussian density. We will use the multivariate Gaussian to put a prior directly on the function (a Gaussian process). i
40 Multivariate Prior Distributions For general Bayesian inference need multivariate priors. E.g. for multivariate linear regression: y i = w x i,: + ɛ i (where we ve dropped c for convenience), we need a prior over w. This motivates a multivariate Gaussian density. We will use the multivariate Gaussian to put a prior directly on the function (a Gaussian process).
41 Prior Distribution Bayesian inference requires a prior on the parameters. The prior represents your belief before you see the data of the likely value of the parameters. For linear regression, consider a Gaussian prior on the intercept: c N (0, α 1 )
42 Gaussian Noise 2 p(c) = N (c 0, α 1 ) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.
43 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(y m, c, x, σ 2 ) = N ( y mx + c, σ 2) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.
44 Gaussian Noise 2 p(c) = N (c 0, α 1 ) p(y m, c, x, σ 2 ) = N ( y mx + c, σ 2) 1 p(c y, m, x, σ 2 ) = y mx N ( c 1+σ 2 /α 1, (σ 2 + α 1 1 ) 1) Figure: A Gaussian prior combines with a Gaussian likelihood for a Gaussian posterior.
45 Stages to Derivation of the Posterior Multiply likelihood by prior they are exponentiated quadratics, the answer is always also an exponentiated quadratic because exp(a 2 ) exp(b 2 ) = exp(a 2 + b 2 ). Complete the square to get the resulting density in the form of a Gaussian. Recognise the mean and (co)variance of the Gaussian. This is the estimate of the posterior.
46 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i
47 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i Multivariate regression likelihood: 1 p(y X, w) = (2πσ 2 ) n/2 exp 1 2σ 2 n ( ) yi w 2 x i,: i=1
48 Multivariate Regression Likelihood Noise corrupted data point y i = w x i,: + ɛ i Multivariate regression likelihood: 1 p(y X, w) = (2πσ 2 ) n/2 exp 1 2σ 2 n ( ) yi w 2 x i,: i=1 Now use a multivariate Gaussian prior: 1 p(w) = (2πα) p 2 ( exp 1 ) 2α w w
49 Two Dimensional Gaussian Consider height, h/m and weight, w/kg. Could sample height from a distribution: p(h) N (1.7, ) And similarly weight: p(w) N (75, 36)
50 Height and Weight Models p(h) p(w) h/m w/kg Gaussian distributions for height and weight.
51 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
52 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
53 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
54 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
55 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
56 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
57 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
58 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
59 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
60 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
61 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
62 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
63 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
64 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
65 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
66 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
67 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
68 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
69 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
70 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
71 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
72 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
73 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m Samples of height and weight
74 Independence Assumption This assumes height and weight are independent. p(h, w) = p(h)p(w) In reality they are dependent (body mass index) = w h 2.
75 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
76 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
77 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
78 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
79 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
80 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
81 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
82 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
83 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
84 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
85 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
86 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
87 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
88 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
89 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
90 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
91 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
92 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
93 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
94 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
95 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
96 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
97 Sampling Two Dimensional Variables Joint Distribution Marginal Distributions p(w) w/kg p(h) h/m
98 Independent Gaussians p(w, h) = p(w)p(h)
99 Independent Gaussians p(w, h) = 1 exp 1 (w µ 1 ) 2 2πσ 2 2πσ 2 2 σ (h µ 2) 2 σ 2 2
100 Independent Gaussians 1 p(w, h) = exp 1 2π σ σ2 2 ([ ] w h [ µ1 µ 2 ]) [ ] σ 2 1 ([ ] 0 1 w 0 σ 2 h 2 [ µ1 µ 2])
101 Independent Gaussians p(y) = ( 1 2π D exp 1 ) 2 (y µ) D 1 (y µ)
102 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 2 1 ( exp 1 ) 2 (y µ) D 1 (y µ)
103 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 1 2 ( exp 1 ) 2 (R y R µ) D 1 (R y R µ)
104 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π D 1 2 this gives a covariance matrix: ( exp 1 ) 2 (y µ) RD 1 R (y µ) C 1 = RD 1 R
105 Correlated Gaussian Form correlated from original by rotating the data space using matrix R. 1 p(y) = 2π C 2 1 this gives a covariance matrix: ( exp 1 ) 2 (y µ) C 1 (y µ) C = RDR
106 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i
107 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 σ 2 i
108 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. σ 2 i
109 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) σ 2 i
110 Recall Univariate Gaussian Properties 1. Sum of Gaussian variables is also Gaussian. y i N ( ) µ i, σ 2 i n n y i N µ i, i=1 i=1 n i=1 2. Scaling a Gaussian leads to a Gaussian. y N ( µ, σ 2) σ 2 i wy N ( wµ, w 2 σ 2)
111 Multivariate Consequence If x N ( µ, Σ )
112 Multivariate Consequence If x N ( µ, Σ ) And y = Wx
113 Multivariate Consequence If x N ( µ, Σ ) And y = Wx Then y N ( Wµ, WΣW )
114 Sampling a Function Multi-variate Gaussians We will consider a Gaussian with a particular structure of covariance matrix. Generate a single sample from this 25 dimensional Gaussian distribution, f = [ f 1, f 2... f 25 ]. We will plot these points against their index.
115 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) j 0 (b) colormap ishowing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
116 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) j 0 (b) colormap ishowing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
117 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
118 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
119 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
120 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
121 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) colormap showing correlations between dimensions Figure: A sample from a 25 dimensional Gaussian distribution.
122 Gaussian Distribution Sample fi (a) A 25 dimensional i correlated random variable (values ploted against index) (b) correlation between f 1 and f 2. Figure: A sample from a 25 dimensional Gaussian distribution.
123 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ).
124 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 =
125 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 = Conditional density: p( f 2 f 1 = 0.313).
126 Prediction of f 2 from f f f 2 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 2 ). We observe that f 1 = Conditional density: p( f 2 f 1 = 0.313).
127 Prediction with Correlated Gaussians Prediction of f 2 from f 1 requires conditional density. Conditional density is also Gaussian. p( f 2 f 1 ) = N f 2 k 1,2 f 1, k 2,2 k2 k 1,1 1,2 k 1,1 where covariance of joint density is given by [ ] k1,1 k K = 1,2 k 2,1 k 2,2
128 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ).
129 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 =
130 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 = Conditional density: p( f 5 f 1 = 0.313).
131 Prediction of f 5 from f f f 5 The single contour of the Gaussian density represents the joint distribution, p( f 1, f 5 ). We observe that f 1 = Conditional density: p( f 5 f 1 = 0.313).
132 Prediction with Correlated Gaussians Prediction of f from f requires multivariate conditional density. Multivariate conditional density is also Gaussian. p(f f) = N ( f K,f K 1 f,f f, K, K,f K 1 f,f K ) f, Here covariance of joint density is given by [ ] Kf,f K K =,f K f, K,
133 Prediction with Correlated Gaussians Prediction of f from f requires multivariate conditional density. Multivariate conditional density is also Gaussian. p(f f) = N ( f µ, Σ ) µ = K,f K 1 f,f f Σ = K, K,f K 1 f,f K f, Here covariance of joint density is given by K = [ ] Kf,f K,f K f, K,
134 Covariance Functions Where did this covariance matrix come from? Exponentiated Quadratic Kernel Function (RBF, Squared Exponential, Gaussian) k (x, x ) = α exp x x 2 2 2l 2 Covariance matrix is built using the inputs to the function x. For the example above it was based on Euclidean distance. The covariance function is also know as a kernel.
135 Covariance Functions Where did this covariance matrix come from? Exponentiated Quadratic Kernel Function (RBF, Squared Exponential, Gaussian) k (x, x ) = α exp x x 2 2 2l 2 Covariance matrix is built using the inputs to the function x. For the example above it was based on Euclidean distance. The covariance function is also know as a kernel
136 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = 3.0 k 1,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
137 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = k 1,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
138 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
139 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 1.00 x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
140 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
141 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
142 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
143 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
144 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
145 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
146 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
147 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
148 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
149 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
150 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
151 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 1.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 2.00 and α = 1.00.
152 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3, x 1 = 3 k 1,1 = 1.0 exp ( ) ( 3 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
153 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3, x 1 = k 1,1 = 1.0 exp ( ) ( 3 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
154 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
155 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 1.0 x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
156 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 1 = k 2,1 = 1.0 exp ( ) (1.2 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
157 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 2 = k 2,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
158 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.2, x 2 = k 2,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
159 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
160 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = 3 k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
161 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 1 = 3 k 3,1 = 1.0 exp ( ) (1.4 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
162 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
163 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
164 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 2 = 1.2 k 3,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
165 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 3 = 1.4 k 3,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
166 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.4, x 3 = 1.4 k 3,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
167 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
168 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
169 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 1 = 3 k 4,1 = 1.0 exp ( ) (2.0 3) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
170 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
171 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
172 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 2 = 1.2 k 4,2 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
173 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
174 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
175 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 3 = 1.4 k 4,3 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
176 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
177 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
178 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 4 = 2.0, x 4 = 2.0 k 4,4 = 1.0 exp ( ) ( ) x 1 = 3, x 2 = 1.2, x 3 = 1.4, and x 4 = 2.0 with l = 2.0 and α = 1.0.
179 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = 3.0 k 1,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
180 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 1 = 3.0, x 1 = k 1,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
181 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
182 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) 4.00 x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
183 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 1 = k 2,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
184 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
185 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 2 = 1.20, x 2 = k 2,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
186 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
187 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
188 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 1 = 3.0 k 3,1 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
189 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
190 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
191 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 2 = 1.20 k 3,2 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
192 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
193 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
194 Covariance Functions Where did this covariance matrix come from? k ( x i, x j ) = α exp ( x i x j 2 2l 2 ) x 3 = 1.40, x 3 = 1.40 k 3,3 = 4.00 exp ( ) ( ) x 1 = 3.0, x 2 = 1.20, and x 3 = 1.40 with l = 5.00 and α = 4.00.
195 Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing Covariance GP Limitations Conclusions
196 Basis Function Form Radial basis functions commonly have the form 2 φ k (x i ) = exp x i µ k 2l 2. 1 Basis function maps data into a feature space in which a linear sum is a non linear function. φ(x) x Figure: A set of radial basis functions with width l = 2 and location parameters µ = [ 4 0 4].
197 Basis Function Representations Represent a function by a linear sum over a basis, f (x i,: ; w) = m w k φ k (x i,: ), (1) k=1 Here: m basis functions and φ k ( ) is kth basis function and w = [w 1,..., w m ]. For standard linear model: φ k (x i,: ) = x i,k.
198 Random Functions Functions derived using: f (x) = m w k φ k (x), k=1 where W is sampled from a Gaussian density, w k N (0, α). f (x) x Figure: Functions sampled using the basis set from figure 3. Each line is a separate sample, generated by a weighted sum of the basis set. The weights, w are sampled from a Gaussian density with variance α = 1.
199 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1
200 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw.
201 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product.
202 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix
203 Direct Construction of Covariance Matrix Use matrix notation to write function, f (x i ; w) = m w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix Φ is fixed and non-stochastic for a given training set.
204 Direct Construction of Covariance Matrix Use matrix notation to write function, m f (x i ; w) = w k φ k (x i ) k=1 computed at training data gives a vector f = Φw. w and f are only related by an inner product. Φ R n p is a design matrix Φ is fixed and non-stochastic for a given training set. f is Gaussian distributed.
205 Expectations We have f = Φ w. We use to denote expectations under prior distributions.
206 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. We use to denote expectations under prior distributions.
207 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. Prior covariance of f is K = ff f f We use to denote expectations under prior distributions.
208 Expectations We have f = Φ w. Prior mean of w was zero giving f = 0. Prior covariance of f is K = ff f f ff = Φ ww Φ, giving K = γ ΦΦ. We use to denote expectations under prior distributions.
209 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ),
210 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l
211 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l For the radial basis used this gives
212 Covariance between Two Points The prior covariance between two points x i and x j is k ( x i, x j ) = φ: (x i ) φ : ( xj ), or in sum notation k ( x i, x j ) = γ m ( ) φ l (x i ) φ l xj l For the radial basis used this gives k ( x i, x j ) = γ m 2 xj 2 exp x i µ k + µ k 2l 2. k=1
213 Covariance Functions RBF Basis Functions k (x, x ) = αφ(x) φ(x ) 2 φ i (x) = exp x µ i 2 1 µ = 0 1 l 2
214 Covariance Functions RBF Basis Functions k (x, x ) = αφ(x) φ(x ) 2 φ i (x) = exp x µ i l µ =
215 Selecting Number and Location of Basis Need to choose 1. location of centers
216 Selecting Number and Location of Basis Need to choose 1. location of centers 2. number of basis functions
217 Selecting Number and Location of Basis Need to choose 1. location of centers 2. number of basis functions Consider uniform spacing over a region: k ( ) m x 2 + x 2 x i, x j = γ exp i j 2µ ( ) k xi + x j + 2µ 2 k 2l 2, k=1 Restrict analysis to 1-D input, x.
218 Uniform Basis Functions Set each center location to µ k = a + µ (k 1).
219 Uniform Basis Functions Set each center location to µ k = a + µ (k 1). Specify the basis functions in terms of their indices, k ( ) m 1 ( x 2 + x 2 i j x i, x j =γ µ exp 2l 2 k=0 2 ( a + µ k ) ( ) ( ) x i + x j + 2 a + µ k 2 ) 2l 2.
220 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1).
221 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1). Take limit as µ 0 so m
222 Infinite Basis Functions Take µ 0 = a and µ m = b so b = a + µ (m 1). Take limit as µ 0 so m b ( x 2 + x 2 i j k(x i, x j ) =γ exp a 2l ( ( )) µ 1 2 ( ) 2 xi + x j 1 2 ) 2 xi + x j 2l 2 dµ, where we have used k µ µ.
223 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l,
224 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l, Now take limit as a and b
225 Result Performing the integration leads to ( ) 2 πl 2 xi k(x i,x j ) = γ exp 2 x j 4l 2 ( ( )) ( ( )) b 1 erf 2 xi + x j a 1 l erf 2 xi + x j l, Now take limit as a and b ( ) 2 xi x j k ( ) x i, x j = α exp 4l 2. where α = γ πl 2.
226 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process.
227 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process. The covariance function is given by the exponentiated quadratic covariance function. ( ) 2 xi x j k ( ) x i, x j = α exp 4l 2. where α = γ πl 2.
228 Infinite Feature Space An RBF model with infinite basis functions is a Gaussian process. The covariance function is the exponentiated quadratic. Note: The functional form for the covariance function and basis functions are similar. this is a special case, in general they are very different Similar results can obtained for multi-dimensional input models Williams (1998); Neal (1996).
229 References I G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of the trp63 transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Research, 18(6): , Jun [URL]. [DOI]. A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression. BMC Bioinformatics, 12(180), [DOI]. R. M. Neal. Bayesian Learning for Neural Networks. Springer, Lecture Notes in Statistics 118. J. Oakley and A. O Hagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4): , C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, [Google Books]. C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5): , 1998.
Multivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationModelling Transcriptional Regulation with Gaussian Processes
Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application
More informationNon Linear Latent Variable Models
Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationJoint Emotion Analysis via Multi-task Gaussian Processes
Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 1 Introduction 2 Multi-task Gaussian Process Regression 3 Experiments and Discussion 4 Conclusions
More informationIntroduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen
Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationLatent Variable Models with Gaussian Processes
Latent Variable Models with Gaussian Processes Neil D. Lawrence GP Master Class 6th February 2017 Outline Motivating Example Linear Dimensionality Reduction Non-linear Dimensionality Reduction Outline
More informationInferring Latent Functions with Gaussian Processes in Differential Equations
Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationDD Advanced Machine Learning
Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationMachine Learning Srihari. Gaussian Processes. Sargur Srihari
Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationModeling human function learning with Gaussian processes
Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly
More informationIntroduction to Machine Learning
Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21
More informationHow to build an automatic statistician
How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationDEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.
More informationState Space Gaussian Processes with Non-Gaussian Likelihoods
State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationSCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationRelevance Vector Machines for Earthquake Response Spectra
2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas
More informationNonparametric Regression With Gaussian Processes
Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes
More informationDropout as a Bayesian Approximation: Insights and Applications
Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors
More informationBAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS
BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,
More information