Tree-structured Gaussian Process Approximations

Size: px
Start display at page:

Download "Tree-structured Gaussian Process Approximations"

Transcription

1 Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, / 27

2 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary 2 / 27

3 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σn), 2 f GP(0, k θ (, )) 3 / 27

4 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σ 2 n), f GP(0, k θ (, )) f 1 f 2 f 3 f n f N f 3 / 27

5 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σn), 2 f GP(0, k θ (, )) Posterior is also a GP, f 1 f 2 f 3 f n f N f m f (x) = K xf (K ff +σ 2 ni) 1 y, k f (x, x ) = k(x, x ) K xf (K ff +σ 2 ni) 1 K fx. Log marginal likelihood for learning L = log N (y; 0, K ff +σ 2 ni). 3 / 27

6 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σn), 2 f GP(0, k θ (, )) Posterior is also a GP, f 1 f 2 f 3 f n f N f m f (x) = K xf (K ff +σni) 2 1 y, k f (x, x ) = k(x, x ) K xf (K ff +σni) 2 1 K fx. Log marginal likelihood for learning L = log N (y; 0, K ff +σni). 2 Cost: O(N 3 ) 3 / 27

7 Prior work Indirect posterior approximation schemes Introducing pseudo-dataset {xm, u m } M m=1 and removing some dependencies in the prior: FITC, PI(T)C (Snelson and Ghahramani 2006; Snelson and Ghahramani 2007) Approximating the prior using M cosine basis functions: SSGP (Lázaro-Gredilla et al. 2010) 4 / 27

8 Prior work Indirect posterior approximation schemes Introducing pseudo-dataset {xm, u m } M m=1 and removing some dependencies in the prior: FITC, PI(T)C (Snelson and Ghahramani 2006; Snelson and Ghahramani 2007) Approximating the prior using M cosine basis functions: SSGP (Lázaro-Gredilla et al. 2010) Direct posterior approximation schemes Variational free energy approach (Seeger 2003; Titsias 2009), SVI extension to handle big data (Hensman et al. 2013) Expectation propagation (Qi et al. 2010) 4 / 27

9 Prior work Indirect posterior approximation schemes Introducing pseudo-dataset {xm, u m } M m=1 and removing some dependencies in the prior: FITC, PI(T)C (Snelson and Ghahramani 2006; Snelson and Ghahramani 2007) Approximating the prior using M cosine basis functions: SSGP (Lázaro-Gredilla et al. 2010) Direct posterior approximation schemes Variational free energy approach (Seeger 2003; Titsias 2009), SVI extension to handle big data (Hensman et al. 2013) Expectation propagation (Qi et al. 2010) Local approximations (Tresp 2000; Urtasun and Darrell 2008) 4 / 27

10 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) 5 / 27

11 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u f 1 f 2 f 3 f n f N f 5 / 27

12 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) u f 1 f 2 f 3 f n f N f 5 / 27

13 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) u f 1 f 2 f 3 f n f N f 5 / 27

14 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 u prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j f 1 f 2 f 3 f n f N f prior: q(u, f) = q(u)q(f u) Calibrate model using KL(p(u, f) q(u, f)) 5 / 27

15 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 u prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j f 1 f 2 f 3 f n f N f prior: q(u, f) = q(u)q(f u) Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) 5 / 27

16 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) f 1 f 2 f 3 f n f N f Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) New generative model: p(u) = N (u; 0, K uu ), p(y u) = N (y; K fu K 1 uu u, diag(k ff K fu K 1 uu K uf ) + σ 2 ni). 5 / 27

17 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) f 1 f 2 f 3 f n f N f Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) New generative model: p(u) = N (u; 0, K uu ), p(y u) = N (y; K fu K 1 uu u, diag(k ff K fu K 1 uu K uf ) + σ 2 ni). Cost: O(NM 2 ) 5 / 27

18 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) f 1 f 2 f 3 f n f N f Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) New generative model: p(u) = N (u; 0, K uu ), p(y u) = N (y; K fu K 1 uu u, diag(k ff K fu K 1 uu K uf ) + σ 2 ni). Cost: O(NM 2 ) If assume f Bi f Bj u, i, j, we obtain PI(T)C: p(y u) = N (y; K fu K 1 uu u, blkdiag(k ff K fu K 1 uu K uf ) + σ 2 ni). 5 / 27

19 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) 6 / 27

20 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) 6 / 27

21 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) 6 / 27

22 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) Optimal distribution q(u) = 1 Z p(u) exp ( dfp(f u) log p(y f) ) and F(q(u)) = log N (y; 0, σni 2 + K fu K 1 uu K uf ) 1 2σn 2 Tr(K ff K fu K 1 uu K uf ). 6 / 27

23 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) Optimal distribution q(u) = 1 Z p(u) exp ( dfp(f u) log p(y f) ) and F(q(u)) = log N (y; 0, σni 2 + K fu K 1 uu K uf ) 1 2σn 2 Tr(K ff K fu K 1 uu K uf ). Cost: O(NM 2 ) 6 / 27

24 Example Exact VFE y x N = 100, M = 10 We only need a small M if the underlying function is simple 7 / 27

25 Example 2 y Exact VFE x N = 100, M = 10 M needs to be large if the underlying function is complicated 8 / 27

26 Limitations of global approximations 3 L 2 1 y 0-1 l x 9 / 27

27 Limitations of global approximations 3 L 2 1 y 0-1 l x Approximately need M D d=1 L d/l d, where L d and l d are the data range and lengthscale in dimension d, 9 / 27

28 Limitations of global approximations 3 L 2 1 y 0-1 l x Approximately need M D d=1 L d/l d, where L d and l d are the data range and lengthscale in dimension d,i.e. large M when Datasets span a large input space: time-series or spatial datasets Underlying functions have short lengthscales, i.e. lots of wiggles 9 / 27

29 Limitations of global approximations 3 L 2 1 y 0-1 l x Approximately need M D d=1 L d/l d, where L d and l d are the data range and lengthscale in dimension d,i.e. large M when Datasets span a large input space: time-series or spatial datasets Underlying functions have short lengthscales, i.e. lots of wiggles O(NM 2 ) is still expensive! Local approximations may give better time/accuracy trade-off. 9 / 27

30 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM 10 / 27

31 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) 10 / 27

32 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) Predict using the posterior of only one partition closest to the test point 1 : p(f y Bi ) = d f i p(f f i )p(f i y Bi ) 1 See Tresp 2000 for a way to combine predictors 10 / 27

33 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) Predict using the posterior of only one partition closest to the test point 1 : p(f y Bi ) = d f i p(f f i )p(f i y Bi ) Partitions have shared or separate hyperparameters 1 See Tresp 2000 for a way to combine predictors 10 / 27

34 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) Predict using the posterior of only one partition closest to the test point 1 : p(f y Bi ) = d f i p(f f i )p(f i y Bi ) Partitions have shared or separate hyperparameters Cost: O(ND 2 ), D: average size of partitions 1 See Tresp 2000 for a way to combine predictors 10 / 27

35 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary 11 / 27

36 Tree-structured approximation (TSGP) TSGP is in the same family as FITC and PITC, i.e. indirect approximation via prior modification, but having additional structures: Local inducing variables for each partition, spare connection between inducing blocks. u u f 1 f 2 f 3 f n f N f (a) Full GP f 1 f 2 f 3 f n f N f (b) FITC u u B1 u B2 u B3 u Bk u BK f B1 f B2 f B3 f Bk f f BK (c) PIC f B1 f B2 f B3 f Bk f f BK (d) Tree (chain) 12 / 27

37 Prior modification Generative model: K q(u) = q(u Bk u Bl ), k=1 K q(f u) = q(f Bk u Bk ), k=1 N p(y f) = p(y n; f n, σn). 2 n=1 u Bl A k, Q k u Bk f Bk C k, R k 13 / 27

38 Prior modification Generative model: q(u) = q(f u) = p(y f) = K q(u Bk u Bl ), k=1 K q(f Bk u Bk ), k=1 N p(y n; f n, σn). 2 n=1 u Bl A k, Q k u Bk f Bk Model calibration by minimising a forward KL divergence, KL(p(f, u) q(f Bk u Bk )q(u Bk u Bl )) k which gives, q(u Bk u Bl ) = p(u Bk u Bl ) = N (u Bk ; A k u Bl, Q k ), q(f Ck u Bk ) = p(f Ck u Bk ) = N (f Ck ; C k u Bk, R k ). C k, R k A k = K uk u l K 1 u l u l, Q k = K uk u k K uk u l K 1 u l u l K ul u k, C k = K f k u k K 1 u k u k, R k = K f k f k K f k u k K 1 u k u k K uk f k. 13 / 27

39 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data 14 / 27

40 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. 14 / 27

41 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. Use Gaussian belief propagation algorithm to find the marginal distribution p(u Bi y) 14 / 27

42 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. Use Gaussian belief propagation algorithm to find the marginal distribution p(u Bi y) Prediction at test points: p(f y) = d u Bi p(f u Bi )p(u Bi y) 14 / 27

43 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. Use Gaussian belief propagation algorithm to find the marginal distribution p(u Bi y) Prediction at test points: p(f y) = d u Bi p(f u Bi )p(u Bi y) Cost: O(ND 2 ), D: average number of observations per block 14 / 27

44 Hyperparameter learning Log marginal likelihood and its derivatives can be computed using the same message passing algorithm: p(y 1:K θ) = K p(y k y 1:k 1, θ) k=1 K d dθ log p(y θ) = [ d dθ log p(u k u l ) p(uk,u l y) + d ] dθ log p(y k u k) p(uk y). k=1 15 / 27

45 Hyperparameter learning Log marginal likelihood and its derivatives can be computed using the same message passing algorithm: p(y 1:K θ) = K p(y k y 1:k 1, θ) k=1 K d dθ log p(y θ) = [ d dθ log p(u k u l ) p(uk,u l y) + d ] dθ log p(y k u k) p(uk y). k=1 Tree construction: starting with k-means clustering to find observations blocks using Kruskal s algorithm to greedily select a tree choosing a large random subset of observations in each block to be pseudo outputs, no optimisation needed 15 / 27

46 Comparison KL minimisation Method KL minimisation Result FITC KL(p(f, u) q(u) n q(fn u)) q(u) = p(u), q(fn u) = p(fn u) PIC KL(p(f, u) q(u) k q(f C k u)) q(u) = p(u), q(f C k u) = p(f C k u) PP KL( 1 Z p(u)p(f u)q(y u) p(f, u y)) q(y u) = N (y; K fuk 1 uu u, σ 2 ni) VFE KL(p(f u)q(u) p(f, u y)) q(u) p(u) exp( log(p(y f)) p(f u) ) EP KL(q(f; u)p(y n f n)/q n(f; u) q(f; u)) q(f; u) p(f) m p(um fm) Tree KL(p(f, u) k q(f B k ub k ) q(u Bk u par(bk ))) q(f Bk u Bk ) = p(f Bk u Bk ) q(u Bk u pa(bk )) = p(u Bk u par(bk )) 16 / 27

47 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary 17 / 27

48 Audio data Task: filling missing data Data: Subband of a speech signal: N = 50000, SE kernel: k θ (t, t ) = σ 2 exp( 1 (t t ) 2 ) 2l 2 Filtered speech signal: N = 50000, spectral mixture kernel: k θ (t, t ) = 2 k=1 σ2 k cos(ω k(t t )) exp( 1 (t t ) 2 ) 2lk 2 Evaluation: Prediction error vs. Training/Test time 18 / 27

49 Audio data Task: filling missing data Data: Subband of a speech signal: N = 50000, SE kernel: k θ (t, t ) = σ 2 exp( 1 (t t ) 2 ) 2l 2 Filtered speech signal: N = 50000, spectral mixture kernel: k θ (t, t ) = 2 k=1 σ2 k cos(ω k(t t )) exp( 1 (t t ) 2 ) 2lk 2 Evaluation: Prediction error vs. Training/Test time 2 3 y t 0 y t 0 y t True Chain Local Time/ms y t Time/ms Left: Subband data, Right: Filtered signal 18 / 27

50 Audio subband data SMSE ,50 5, , , ,125 2,50 5,125 5,100 5,50 2,20 5,20 5, SMSE Chain Local FITC VFE SSGP 128 5,50 5, , ,125 2,50 5,125 5,100 5,50 2,20 5,20 5, , ,10 2, Training time/s Test time/s 19 / 27

51 Audio filter data 1 2,50 2,40 5,125 5,20 2,10 2,8 1 2,50 5,25 2,40 5,20 2,10 2,8 5, SMSE ,25 5,125 5,20 2,8 512 SMSE ,25 2,8 5,125 5, ,40 2,50 2,20 2,10 Chain Local FITC 2,40 2,50 2,20 2, Training time/s Test time/s 20 / 27

52 Terrain data Task: filling missing data Data: Altitude of a 20kmx30km region, 80 missing blocks of 1kmx1km or 200k/40k training/test points, 2D SE kernel Evaluation: Prediction error vs. Training/Test time 21 / 27

53 Terrain data Task: filling missing data Data: Altitude of a 20kmx30km region, 80 missing blocks of 1kmx1km or 200k/40k training/test points, 2D SE kernel Evaluation: Prediction error vs. Training/Test time (a) (b) (c) graph complete data tree inference error local inference error FITC inference error 21 / 27

54 Terrain data SMSE ,400 3,300 3,300 3,400 5,300 7,400 10, ,300 33, ,400 SMSE ,400 3,300 3,300 3, ,400 5, ,300 15, ,400 20,400 7, VFE FITC SSGP Tree Local Training time/s 10,300 13,400 20,400 33, ,300 13,400 20,400 33, Test time/ms 22 / 27

55 Summary Tree-structured Gaussian process approximation pseudo-dataset has tree/chain structure model was calibrated using a KL divergence inference and learning via Gaussian belief propagatation better time/accuracy trade-off compared to FITC, VFE possible extensions: online learning, loopy BP 23 / 27

56 References Hensman, James, Nicolo Fusi, and Neil D Lawrence (2013). Gaussian processes for big data. In: arxiv preprint arxiv: Lázaro-Gredilla, Miguel et al. (2010). Sparse spectrum Gaussian process regression. In: The Journal of Machine Learning Research 11, pp Qi, Yuan (Alan), Ahmed H. Abdel-Gawad, and Thomas P. Minka (2010). Sparse-posterior Gaussian Processes for general likelihoods. In: UAI. Ed. by Peter Grünwald and Peter Spirtes. AUAI Press, pp Seeger, Matthias (2003). Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis. School of Informatics, College of Science and Engineering, University of Edinburgh. Snelson, Edward and Zoubin Ghahramani (2006). Sparse Gaussian Processes using Pseudo-inputs. In: Advances in Neural Information Processing Systems. MIT press, pp (2007). Local and global sparse Gaussian process approximations. In: International Conference on Artificial Intelligence and Statistics, pp Titsias, Michalis K. (2009). Variational Learning of Inducing Variables in Sparse Gaussian Processes. In: International Conference on Artificial Intelligence and Statistics, pp Tresp, Volker (2000). A Bayesian committee machine. In: Neural Computation 12.11, pp Urtasun, Raquel and Trevor Darrell (2008). Sparse probabilistic regression for activity-independent human pose inference. In: Computer Vision and Pattern Recognition, CVPR IEEE Conference on. IEEE, pp / 27

57 Thanks!

58 Bayesian Committee Machine (BCM) (Tresp 2000) BCM combines predictions from M estimators, each uses a subset of training points Consider M partitions of the training set, by Bayes rule p(f y B1:m ) = p(f y B1:m 1, y Bm ) p(f )p(y Bm f )p(y B1:m 1 y Bm, f ) p(f )p(y Bm f )p(y B1:m 1 f ) p(f y Bm )p(f y B1:m 1 ) p(f ) (1) Apply 1 recursively to obtain p(f y B1:M ) M i=1 p(f y Bi ) p(f ) M 1 26 / 27

59 BCM for GP regression (Tresp 2000) Let p(f ) = N (0, K ) and p(f y Bi ) = N (ˆµ i, ˆΣ i ), then p(f y B1:M ) = N (ˆµ, ˆΣ) where, ˆΣ 1 = (M 1) K + ˆΣ 1 ˆµ = M i=1 Cost: O(ND 2 ), D: partition size ˆΣ 1 i ˆµ i More test points give better approximation! M i=1 ˆΣ 1 i 27 / 27

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression

Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Michalis K. Titsias Department of Informatics Athens University of Economics and Business mtitsias@aueb.gr Miguel Lázaro-Gredilla

More information

Gaussian Processes for Big Data. James Hensman

Gaussian Processes for Big Data. James Hensman Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

Variable sigma Gaussian processes: An expectation propagation perspective

Variable sigma Gaussian processes: An expectation propagation perspective Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression

Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression JMLR: Workshop and Conference Proceedings : 9, 08. Symposium on Advances in Approximate Bayesian Inference Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Distributed Gaussian Processes

Distributed Gaussian Processes Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Sparse Approximations for Non-Conjugate Gaussian Process Regression

Sparse Approximations for Non-Conjugate Gaussian Process Regression Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,

More information

Gaussian Process Random Fields

Gaussian Process Random Fields Gaussian Process Random Fields David A. Moore and Stuart J. Russell Computer Science Division University of California, Berkeley Berkeley, CA 94709 {dmoore, russell}@cs.berkeley.edu Abstract Gaussian processes

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Deep Gaussian Processes

Deep Gaussian Processes Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian

More information

Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes

Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes Journal of Machine Learning Research 17 (2016) 1-62 Submitted 9/14; Revised 7/15; Published 4/16 Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes Andreas C. Damianou

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

arxiv: v1 [stat.ml] 27 May 2017

arxiv: v1 [stat.ml] 27 May 2017 Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes arxiv:175.986v1 [stat.ml] 7 May 17 Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk

More information

Non-Gaussian likelihoods for Gaussian Processes

Non-Gaussian likelihoods for Gaussian Processes Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model

More information

Incremental Variational Sparse Gaussian Process Regression

Incremental Variational Sparse Gaussian Process Regression Incremental Variational Sparse Gaussian Process Regression Ching-An Cheng Institute for Robotics and Intelligent Machines Georgia Institute of Technology Atlanta, GA 30332 cacheng@gatech.edu Byron Boots

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science, University of Manchester, UK mtitsias@cs.man.ac.uk Abstract Sparse Gaussian process methods

More information

arxiv: v1 [stat.ml] 8 Sep 2014

arxiv: v1 [stat.ml] 8 Sep 2014 VARIATIONAL GP-LVM Variational Inference for Uncertainty on the Inputs of Gaussian Process Models arxiv:1409.2287v1 [stat.ml] 8 Sep 2014 Andreas C. Damianou Dept. of Computer Science and Sheffield Institute

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Local and global sparse Gaussian process approximations

Local and global sparse Gaussian process approximations Local and global sparse Gaussian process approximations Edward Snelson Gatsby Computational euroscience Unit University College London, UK snelson@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering

More information

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk Neil D. Lawrence lawrennd@amazon.com

More information

Approximation Methods for Gaussian Process Regression

Approximation Methods for Gaussian Process Regression Approximation Methods for Gaussian Process Regression Joaquin Quiñonero-Candela Applied Games, Microsoft Research Ltd. 7 J J Thomson Avenue, CB3 0FB Cambridge, UK joaquinc@microsoft.com Carl Edward Ramussen

More information

Approximation Methods for Gaussian Process Regression

Approximation Methods for Gaussian Process Regression Approximation Methods for Gaussian Process Regression Joaquin Quiñonero-Candela Applied Games, Microsoft Research Ltd. 7 J J Thomson Avenue, CB3 0FB Cambridge, UK joaquinc@microsoft.com Carl Edward Ramussen

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Circular Pseudo-Point Approximations for Scaling Gaussian Processes

Circular Pseudo-Point Approximations for Scaling Gaussian Processes Circular Pseudo-Point Approximations for Scaling Gaussian Processes Will Tebbutt Invenia Labs, Cambridge, UK will.tebbutt@invenialabs.co.uk Thang D. Bui University of Cambridge tdb4@cam.ac.uk Richard E.

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation

A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation Journal of achine Learning Research 18 (217) 1-72 Submitted 11/16; Revised 1/17; Published 1/17 A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Non Linear Latent Variable Models

Non Linear Latent Variable Models Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable

More information

The Automatic Statistician

The Automatic Statistician The Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ James Lloyd, David Duvenaud (Cambridge) and

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Sparse Spectral Sampling Gaussian Processes

Sparse Spectral Sampling Gaussian Processes Sparse Spectral Sampling Gaussian Processes Miguel Lázaro-Gredilla Department of Signal Processing & Communications Universidad Carlos III de Madrid, Spain miguel@tsc.uc3m.es Joaquin Quiñonero-Candela

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression Haitao Liu 1 Jianfei Cai 2 Yi Wang 3 Yew-Soon Ong 2 4 Abstract In order to scale standard Gaussian process (GP)

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

System identification and control with (deep) Gaussian processes. Andreas Damianou

System identification and control with (deep) Gaussian processes. Andreas Damianou System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes Alexander G. de G. Matthews 1, James Hensman 2, Richard E. Turner 1, Zoubin Ghahramani 1 1 University of Cambridge,

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

arxiv: v1 [stat.ml] 7 Nov 2014

arxiv: v1 [stat.ml] 7 Nov 2014 James Hensman Alex Matthews Zoubin Ghahramani University of Sheffield University of Cambridge University of Cambridge arxiv:1411.2005v1 [stat.ml] 7 Nov 2014 Abstract Gaussian process classification is

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Deep Neural Networks as Gaussian Processes

Deep Neural Networks as Gaussian Processes Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

State Space Gaussian Processes with Non-Gaussian Likelihoods

State Space Gaussian Processes with Non-Gaussian Likelihoods State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

What, exactly, is a cluster? - Bernhard Schölkopf, personal communication

What, exactly, is a cluster? - Bernhard Schölkopf, personal communication Chapter 1 Warped Mixture Models What, exactly, is a cluster? - Bernhard Schölkopf, personal communication Previous chapters showed how the probabilistic nature of GPs sometimes allows the automatic determination

More information

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009 with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Deep Gaussian Processes for Regression using Approximate Expectation Propagation

Deep Gaussian Processes for Regression using Approximate Expectation Propagation Deep Gaussian Processes for Regression using Approximate Expectation Propagation Thang D. Bui 1 José Miguel Hernández-Lobato 2 Daniel Hernández-Lobato 3 Yingzhen Li 1 Richard E. Turner 1 1 University of

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Variational Fourier Features for Gaussian Processes

Variational Fourier Features for Gaussian Processes Journal of Machine Learning Research 18 (18) 1-5 Submitted 11/16; Revised 11/17; Published 4/18 Variational Fourier Features for Gaussian Processes James Hensman Nicolas Durrande PROWLER.io 66-68 Hills

More information

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data

More information

Bristol Machine Learning Reading Group

Bristol Machine Learning Reading Group Bristol Machine Learning Reading Group Introduction to Variational Inference Carl Henrik Ek - carlhenrik.ek@bristol.ac.uk November 25, 2016 http://www.carlhenrik.com Introduction Ronald Aylmer Fisher 1

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

Nonparametric Bayesian inference on multivariate exponential families

Nonparametric Bayesian inference on multivariate exponential families Nonparametric Bayesian inference on multivariate exponential families William Vega-Brown, Marek Doniec, and Nicholas Roy Massachusetts Institute of Technology Cambridge, MA 2139 {wrvb, doniec, nickroy}@csail.mit.edu

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández Lobato 1,2, David López Paz 3,2 and Zoubin Ghahramani 1 June 27, 2013 1 University of Cambridge 2 Equal Contributor 3 Ma-Planck-Institute

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Sparse Variational Inference for Generalized Gaussian Process Models

Sparse Variational Inference for Generalized Gaussian Process Models Sparse Variational Inference for Generalized Gaussian Process odels Rishit Sheth a Yuyang Wang b Roni Khardon a a Department of Computer Science, Tufts University, edford, A 55, USA b Amazon, 5 9th Ave

More information

Sparse Spectrum Gaussian Process Regression

Sparse Spectrum Gaussian Process Regression Journal of Machine Learning Research 11 (2010) 1865-1881 Submitted 2/08; Revised 2/10; Published 6/10 Sparse Spectrum Gaussian Process Regression Miguel Lázaro-Gredilla Departamento de Teoría de la Señal

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Gaussian Processes for Machine Learning (GPML) Toolbox

Gaussian Processes for Machine Learning (GPML) Toolbox Journal of Machine Learning Research 11 (2010) 3011-3015 Submitted 8/10; Revised 9/10; Published 11/10 Gaussian Processes for Machine Learning (GPML) Toolbox Carl Edward Rasmussen Department of Engineering

More information

MCMC for Variationally Sparse Gaussian Processes

MCMC for Variationally Sparse Gaussian Processes MCMC for Variationally Sparse Gaussian Processes James Hensman CHICAS, Lancaster University james.hensman@lancaster.ac.uk Maurizio Filippone EURECOM maurizio.filippone@eurecom.fr Alexander G. de G. Matthews

More information