Tree-structured Gaussian Process Approximations
|
|
- Alexandrina Clarke
- 5 years ago
- Views:
Transcription
1 Tree-structured Gaussian Process Approximations Thang Bui joint work with Richard Turner MLG, Cambridge July 1st, / 27
2 Outline 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary 2 / 27
3 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σn), 2 f GP(0, k θ (, )) 3 / 27
4 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σ 2 n), f GP(0, k θ (, )) f 1 f 2 f 3 f n f N f 3 / 27
5 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σn), 2 f GP(0, k θ (, )) Posterior is also a GP, f 1 f 2 f 3 f n f N f m f (x) = K xf (K ff +σ 2 ni) 1 y, k f (x, x ) = k(x, x ) K xf (K ff +σ 2 ni) 1 K fx. Log marginal likelihood for learning L = log N (y; 0, K ff +σ 2 ni). 3 / 27
6 GPs for regression - A quick recap Given {x n, y n } N n=1, y n = f(x n ) + ɛ n, ɛ n iid N (0, σn), 2 f GP(0, k θ (, )) Posterior is also a GP, f 1 f 2 f 3 f n f N f m f (x) = K xf (K ff +σni) 2 1 y, k f (x, x ) = k(x, x ) K xf (K ff +σni) 2 1 K fx. Log marginal likelihood for learning L = log N (y; 0, K ff +σni). 2 Cost: O(N 3 ) 3 / 27
7 Prior work Indirect posterior approximation schemes Introducing pseudo-dataset {xm, u m } M m=1 and removing some dependencies in the prior: FITC, PI(T)C (Snelson and Ghahramani 2006; Snelson and Ghahramani 2007) Approximating the prior using M cosine basis functions: SSGP (Lázaro-Gredilla et al. 2010) 4 / 27
8 Prior work Indirect posterior approximation schemes Introducing pseudo-dataset {xm, u m } M m=1 and removing some dependencies in the prior: FITC, PI(T)C (Snelson and Ghahramani 2006; Snelson and Ghahramani 2007) Approximating the prior using M cosine basis functions: SSGP (Lázaro-Gredilla et al. 2010) Direct posterior approximation schemes Variational free energy approach (Seeger 2003; Titsias 2009), SVI extension to handle big data (Hensman et al. 2013) Expectation propagation (Qi et al. 2010) 4 / 27
9 Prior work Indirect posterior approximation schemes Introducing pseudo-dataset {xm, u m } M m=1 and removing some dependencies in the prior: FITC, PI(T)C (Snelson and Ghahramani 2006; Snelson and Ghahramani 2007) Approximating the prior using M cosine basis functions: SSGP (Lázaro-Gredilla et al. 2010) Direct posterior approximation schemes Variational free energy approach (Seeger 2003; Titsias 2009), SVI extension to handle big data (Hensman et al. 2013) Expectation propagation (Qi et al. 2010) Local approximations (Tresp 2000; Urtasun and Darrell 2008) 4 / 27
10 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) 5 / 27
11 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u f 1 f 2 f 3 f n f N f 5 / 27
12 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) u f 1 f 2 f 3 f n f N f 5 / 27
13 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) u f 1 f 2 f 3 f n f N f 5 / 27
14 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 u prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j f 1 f 2 f 3 f n f N f prior: q(u, f) = q(u)q(f u) Calibrate model using KL(p(u, f) q(u, f)) 5 / 27
15 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 u prior: p(u, f) = p(u)p(f u) Assume f i f j u, i, j f 1 f 2 f 3 f n f N f prior: q(u, f) = q(u)q(f u) Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) 5 / 27
16 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) f 1 f 2 f 3 f n f N f Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) New generative model: p(u) = N (u; 0, K uu ), p(y u) = N (y; K fu K 1 uu u, diag(k ff K fu K 1 uu K uf ) + σ 2 ni). 5 / 27
17 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) f 1 f 2 f 3 f n f N f Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) New generative model: p(u) = N (u; 0, K uu ), p(y u) = N (y; K fu K 1 uu u, diag(k ff K fu K 1 uu K uf ) + σ 2 ni). Cost: O(NM 2 ) 5 / 27
18 Fully independent training conditionals (FITC or SPGP) (Snelson and Ghahramani 2006) Introduce pseudo-dataset {x m, u m } M m=1 prior: p(u, f) = p(u)p(f u) u Assume f i f j u, i, j prior: q(u, f) = q(u)q(f u) f 1 f 2 f 3 f n f N f Calibrate model using KL(p(u, f) q(u, f)) q(u) = p(u) and q(f i u) = p(f i u) New generative model: p(u) = N (u; 0, K uu ), p(y u) = N (y; K fu K 1 uu u, diag(k ff K fu K 1 uu K uf ) + σ 2 ni). Cost: O(NM 2 ) If assume f Bi f Bj u, i, j, we obtain PI(T)C: p(y u) = N (y; K fu K 1 uu u, blkdiag(k ff K fu K 1 uu K uf ) + σ 2 ni). 5 / 27
19 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) 6 / 27
20 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) 6 / 27
21 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) 6 / 27
22 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) Optimal distribution q(u) = 1 Z p(u) exp ( dfp(f u) log p(y f) ) and F(q(u)) = log N (y; 0, σni 2 + K fu K 1 uu K uf ) 1 2σn 2 Tr(K ff K fu K 1 uu K uf ). 6 / 27
23 Variational free energy (VFE) appoach (Titsias 2009) Augment model by pseudo-dataset {x m, u m } M m=1 joint posterior of f and u: p(f, u y) p(f, u)p(y f) Introducing variational dist. q(f, u) = p(f u)q(u) gives the ELBO: F(q(u)) = d u dfp(f u)q(u) log p(u)p(y f) q(u) Optimal distribution q(u) = 1 Z p(u) exp ( dfp(f u) log p(y f) ) and F(q(u)) = log N (y; 0, σni 2 + K fu K 1 uu K uf ) 1 2σn 2 Tr(K ff K fu K 1 uu K uf ). Cost: O(NM 2 ) 6 / 27
24 Example Exact VFE y x N = 100, M = 10 We only need a small M if the underlying function is simple 7 / 27
25 Example 2 y Exact VFE x N = 100, M = 10 M needs to be large if the underlying function is complicated 8 / 27
26 Limitations of global approximations 3 L 2 1 y 0-1 l x 9 / 27
27 Limitations of global approximations 3 L 2 1 y 0-1 l x Approximately need M D d=1 L d/l d, where L d and l d are the data range and lengthscale in dimension d, 9 / 27
28 Limitations of global approximations 3 L 2 1 y 0-1 l x Approximately need M D d=1 L d/l d, where L d and l d are the data range and lengthscale in dimension d,i.e. large M when Datasets span a large input space: time-series or spatial datasets Underlying functions have short lengthscales, i.e. lots of wiggles 9 / 27
29 Limitations of global approximations 3 L 2 1 y 0-1 l x Approximately need M D d=1 L d/l d, where L d and l d are the data range and lengthscale in dimension d,i.e. large M when Datasets span a large input space: time-series or spatial datasets Underlying functions have short lengthscales, i.e. lots of wiggles O(NM 2 ) is still expensive! Local approximations may give better time/accuracy trade-off. 9 / 27
30 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM 10 / 27
31 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) 10 / 27
32 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) Predict using the posterior of only one partition closest to the test point 1 : p(f y Bi ) = d f i p(f f i )p(f i y Bi ) 1 See Tresp 2000 for a way to combine predictors 10 / 27
33 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) Predict using the posterior of only one partition closest to the test point 1 : p(f y Bi ) = d f i p(f f i )p(f i y Bi ) Partitions have shared or separate hyperparameters 1 See Tresp 2000 for a way to combine predictors 10 / 27
34 Local GPs Divide the training set into M disjoint partitions {B i } M i=1, where B i = {x j, y j } N i j=1. f B1 f B2 f B3 f Bj f BM Obtain the posterior for each partition: p(f i y Bi ) p(f i )p(y Bi f i ) Predict using the posterior of only one partition closest to the test point 1 : p(f y Bi ) = d f i p(f f i )p(f i y Bi ) Partitions have shared or separate hyperparameters Cost: O(ND 2 ), D: average size of partitions 1 See Tresp 2000 for a way to combine predictors 10 / 27
35 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary 11 / 27
36 Tree-structured approximation (TSGP) TSGP is in the same family as FITC and PITC, i.e. indirect approximation via prior modification, but having additional structures: Local inducing variables for each partition, spare connection between inducing blocks. u u f 1 f 2 f 3 f n f N f (a) Full GP f 1 f 2 f 3 f n f N f (b) FITC u u B1 u B2 u B3 u Bk u BK f B1 f B2 f B3 f Bk f f BK (c) PIC f B1 f B2 f B3 f Bk f f BK (d) Tree (chain) 12 / 27
37 Prior modification Generative model: K q(u) = q(u Bk u Bl ), k=1 K q(f u) = q(f Bk u Bk ), k=1 N p(y f) = p(y n; f n, σn). 2 n=1 u Bl A k, Q k u Bk f Bk C k, R k 13 / 27
38 Prior modification Generative model: q(u) = q(f u) = p(y f) = K q(u Bk u Bl ), k=1 K q(f Bk u Bk ), k=1 N p(y n; f n, σn). 2 n=1 u Bl A k, Q k u Bk f Bk Model calibration by minimising a forward KL divergence, KL(p(f, u) q(f Bk u Bk )q(u Bk u Bl )) k which gives, q(u Bk u Bl ) = p(u Bk u Bl ) = N (u Bk ; A k u Bl, Q k ), q(f Ck u Bk ) = p(f Ck u Bk ) = N (f Ck ; C k u Bk, R k ). C k, R k A k = K uk u l K 1 u l u l, Q k = K uk u k K uk u l K 1 u l u l K ul u k, C k = K f k u k K 1 u k u k, R k = K f k f k K f k u k K 1 u k u k K uk f k. 13 / 27
39 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data 14 / 27
40 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. 14 / 27
41 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. Use Gaussian belief propagation algorithm to find the marginal distribution p(u Bi y) 14 / 27
42 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. Use Gaussian belief propagation algorithm to find the marginal distribution p(u Bi y) Prediction at test points: p(f y) = d u Bi p(f u Bi )p(u Bi y) 14 / 27
43 Inference Marginalising out f, the model is a tree-structured Gaussian model with latent variables u and observations y Special case: Linear Gaussian state space model for time series or 1D data Joint posterior: p(u y) ( exp 1 ) 2 u i J i u i + u i h i exp (u i J ij u j ) i V i,j E where J i = Q 1 i + C i (R i + σ 2 I i ) 1 C i + j nei(i) A j Q 1 j A j, h i = C i R 1 i y i, and J ij = Q 1 i A i. Use Gaussian belief propagation algorithm to find the marginal distribution p(u Bi y) Prediction at test points: p(f y) = d u Bi p(f u Bi )p(u Bi y) Cost: O(ND 2 ), D: average number of observations per block 14 / 27
44 Hyperparameter learning Log marginal likelihood and its derivatives can be computed using the same message passing algorithm: p(y 1:K θ) = K p(y k y 1:k 1, θ) k=1 K d dθ log p(y θ) = [ d dθ log p(u k u l ) p(uk,u l y) + d ] dθ log p(y k u k) p(uk y). k=1 15 / 27
45 Hyperparameter learning Log marginal likelihood and its derivatives can be computed using the same message passing algorithm: p(y 1:K θ) = K p(y k y 1:k 1, θ) k=1 K d dθ log p(y θ) = [ d dθ log p(u k u l ) p(uk,u l y) + d ] dθ log p(y k u k) p(uk y). k=1 Tree construction: starting with k-means clustering to find observations blocks using Kruskal s algorithm to greedily select a tree choosing a large random subset of observations in each block to be pseudo outputs, no optimisation needed 15 / 27
46 Comparison KL minimisation Method KL minimisation Result FITC KL(p(f, u) q(u) n q(fn u)) q(u) = p(u), q(fn u) = p(fn u) PIC KL(p(f, u) q(u) k q(f C k u)) q(u) = p(u), q(f C k u) = p(f C k u) PP KL( 1 Z p(u)p(f u)q(y u) p(f, u y)) q(y u) = N (y; K fuk 1 uu u, σ 2 ni) VFE KL(p(f u)q(u) p(f, u y)) q(u) p(u) exp( log(p(y f)) p(f u) ) EP KL(q(f; u)p(y n f n)/q n(f; u) q(f; u)) q(f; u) p(f) m p(um fm) Tree KL(p(f, u) k q(f B k ub k ) q(u Bk u par(bk ))) q(f Bk u Bk ) = p(f Bk u Bk ) q(u Bk u pa(bk )) = p(u Bk u par(bk )) 16 / 27
47 1 Introduction 2 Tree-structured GP approximation 3 Experiments 4 Summary 17 / 27
48 Audio data Task: filling missing data Data: Subband of a speech signal: N = 50000, SE kernel: k θ (t, t ) = σ 2 exp( 1 (t t ) 2 ) 2l 2 Filtered speech signal: N = 50000, spectral mixture kernel: k θ (t, t ) = 2 k=1 σ2 k cos(ω k(t t )) exp( 1 (t t ) 2 ) 2lk 2 Evaluation: Prediction error vs. Training/Test time 18 / 27
49 Audio data Task: filling missing data Data: Subband of a speech signal: N = 50000, SE kernel: k θ (t, t ) = σ 2 exp( 1 (t t ) 2 ) 2l 2 Filtered speech signal: N = 50000, spectral mixture kernel: k θ (t, t ) = 2 k=1 σ2 k cos(ω k(t t )) exp( 1 (t t ) 2 ) 2lk 2 Evaluation: Prediction error vs. Training/Test time 2 3 y t 0 y t 0 y t True Chain Local Time/ms y t Time/ms Left: Subband data, Right: Filtered signal 18 / 27
50 Audio subband data SMSE ,50 5, , , ,125 2,50 5,125 5,100 5,50 2,20 5,20 5, SMSE Chain Local FITC VFE SSGP 128 5,50 5, , ,125 2,50 5,125 5,100 5,50 2,20 5,20 5, , ,10 2, Training time/s Test time/s 19 / 27
51 Audio filter data 1 2,50 2,40 5,125 5,20 2,10 2,8 1 2,50 5,25 2,40 5,20 2,10 2,8 5, SMSE ,25 5,125 5,20 2,8 512 SMSE ,25 2,8 5,125 5, ,40 2,50 2,20 2,10 Chain Local FITC 2,40 2,50 2,20 2, Training time/s Test time/s 20 / 27
52 Terrain data Task: filling missing data Data: Altitude of a 20kmx30km region, 80 missing blocks of 1kmx1km or 200k/40k training/test points, 2D SE kernel Evaluation: Prediction error vs. Training/Test time 21 / 27
53 Terrain data Task: filling missing data Data: Altitude of a 20kmx30km region, 80 missing blocks of 1kmx1km or 200k/40k training/test points, 2D SE kernel Evaluation: Prediction error vs. Training/Test time (a) (b) (c) graph complete data tree inference error local inference error FITC inference error 21 / 27
54 Terrain data SMSE ,400 3,300 3,300 3,400 5,300 7,400 10, ,300 33, ,400 SMSE ,400 3,300 3,300 3, ,400 5, ,300 15, ,400 20,400 7, VFE FITC SSGP Tree Local Training time/s 10,300 13,400 20,400 33, ,300 13,400 20,400 33, Test time/ms 22 / 27
55 Summary Tree-structured Gaussian process approximation pseudo-dataset has tree/chain structure model was calibrated using a KL divergence inference and learning via Gaussian belief propagatation better time/accuracy trade-off compared to FITC, VFE possible extensions: online learning, loopy BP 23 / 27
56 References Hensman, James, Nicolo Fusi, and Neil D Lawrence (2013). Gaussian processes for big data. In: arxiv preprint arxiv: Lázaro-Gredilla, Miguel et al. (2010). Sparse spectrum Gaussian process regression. In: The Journal of Machine Learning Research 11, pp Qi, Yuan (Alan), Ahmed H. Abdel-Gawad, and Thomas P. Minka (2010). Sparse-posterior Gaussian Processes for general likelihoods. In: UAI. Ed. by Peter Grünwald and Peter Spirtes. AUAI Press, pp Seeger, Matthias (2003). Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis. School of Informatics, College of Science and Engineering, University of Edinburgh. Snelson, Edward and Zoubin Ghahramani (2006). Sparse Gaussian Processes using Pseudo-inputs. In: Advances in Neural Information Processing Systems. MIT press, pp (2007). Local and global sparse Gaussian process approximations. In: International Conference on Artificial Intelligence and Statistics, pp Titsias, Michalis K. (2009). Variational Learning of Inducing Variables in Sparse Gaussian Processes. In: International Conference on Artificial Intelligence and Statistics, pp Tresp, Volker (2000). A Bayesian committee machine. In: Neural Computation 12.11, pp Urtasun, Raquel and Trevor Darrell (2008). Sparse probabilistic regression for activity-independent human pose inference. In: Computer Vision and Pattern Recognition, CVPR IEEE Conference on. IEEE, pp / 27
57 Thanks!
58 Bayesian Committee Machine (BCM) (Tresp 2000) BCM combines predictions from M estimators, each uses a subset of training points Consider M partitions of the training set, by Bayes rule p(f y B1:m ) = p(f y B1:m 1, y Bm ) p(f )p(y Bm f )p(y B1:m 1 y Bm, f ) p(f )p(y Bm f )p(y B1:m 1 f ) p(f y Bm )p(f y B1:m 1 ) p(f ) (1) Apply 1 recursively to obtain p(f y B1:M ) M i=1 p(f y Bi ) p(f ) M 1 26 / 27
59 BCM for GP regression (Tresp 2000) Let p(f ) = N (0, K ) and p(f y Bi ) = N (ˆµ i, ˆΣ i ), then p(f y B1:M ) = N (ˆµ, ˆΣ) where, ˆΣ 1 = (M 1) K + ˆΣ 1 ˆµ = M i=1 Cost: O(ND 2 ), D: partition size ˆΣ 1 i ˆµ i More test points give better approximation! M i=1 ˆΣ 1 i 27 / 27
Variational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationVariational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Michalis K. Titsias Department of Informatics Athens University of Economics and Business mtitsias@aueb.gr Miguel Lázaro-Gredilla
More informationGaussian Processes for Big Data. James Hensman
Gaussian Processes for Big Data James Hensman Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes Stochastic Variational
More informationNon-Factorised Variational Inference in Dynamical Systems
st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent
More informationVariable sigma Gaussian processes: An expectation propagation perspective
Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationExplicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression
JMLR: Workshop and Conference Proceedings : 9, 08. Symposium on Advances in Approximate Bayesian Inference Explicit Rates of Convergence for Sparse Variational Inference in Gaussian Process Regression
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationDeep Gaussian Processes
Deep Gaussian Processes Neil D. Lawrence 30th April 2015 KTH Royal Institute of Technology Outline Introduction Deep Gaussian Process Models Variational Approximation Samples and Results Outline Introduction
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationDistributed Gaussian Processes
Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September
More informationExpectation Propagation in Factor Graphs: A Tutorial
DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational
More informationSparse Approximations for Non-Conjugate Gaussian Process Regression
Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,
More informationGaussian Process Random Fields
Gaussian Process Random Fields David A. Moore and Stuart J. Russell Computer Science Division University of California, Berkeley Berkeley, CA 94709 {dmoore, russell}@cs.berkeley.edu Abstract Gaussian processes
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationDeep Gaussian Processes
Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian
More informationVariational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes
Journal of Machine Learning Research 17 (2016) 1-62 Submitted 9/14; Revised 7/15; Published 4/16 Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes Andreas C. Damianou
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More informationarxiv: v1 [stat.ml] 27 May 2017
Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes arxiv:175.986v1 [stat.ml] 7 May 17 Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk
More informationNon-Gaussian likelihoods for Gaussian Processes
Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model
More informationIncremental Variational Sparse Gaussian Process Regression
Incremental Variational Sparse Gaussian Process Regression Ching-An Cheng Institute for Robotics and Intelligent Machines Georgia Institute of Technology Atlanta, GA 30332 cacheng@gatech.edu Byron Boots
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science, University of Manchester, UK mtitsias@cs.man.ac.uk Abstract Sparse Gaussian process methods
More informationarxiv: v1 [stat.ml] 8 Sep 2014
VARIATIONAL GP-LVM Variational Inference for Uncertainty on the Inputs of Gaussian Process Models arxiv:1409.2287v1 [stat.ml] 8 Sep 2014 Andreas C. Damianou Dept. of Computer Science and Sheffield Institute
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationDeep learning with differential Gaussian process flows
Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto
More informationDoubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London
Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train Doubly Stochastic
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationLocal and global sparse Gaussian process approximations
Local and global sparse Gaussian process approximations Edward Snelson Gatsby Computational euroscience Unit University College London, UK snelson@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering
More informationEfficient Modeling of Latent Information in Supervised Learning using Gaussian Processes
Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes Zhenwen Dai zhenwend@amazon.com Mauricio A. Álvarez mauricio.alvarez@sheffield.ac.uk Neil D. Lawrence lawrennd@amazon.com
More informationApproximation Methods for Gaussian Process Regression
Approximation Methods for Gaussian Process Regression Joaquin Quiñonero-Candela Applied Games, Microsoft Research Ltd. 7 J J Thomson Avenue, CB3 0FB Cambridge, UK joaquinc@microsoft.com Carl Edward Ramussen
More informationApproximation Methods for Gaussian Process Regression
Approximation Methods for Gaussian Process Regression Joaquin Quiñonero-Candela Applied Games, Microsoft Research Ltd. 7 J J Thomson Avenue, CB3 0FB Cambridge, UK joaquinc@microsoft.com Carl Edward Ramussen
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationCircular Pseudo-Point Approximations for Scaling Gaussian Processes
Circular Pseudo-Point Approximations for Scaling Gaussian Processes Will Tebbutt Invenia Labs, Cambridge, UK will.tebbutt@invenialabs.co.uk Thang D. Bui University of Cambridge tdb4@cam.ac.uk Richard E.
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationA Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation
Journal of achine Learning Research 18 (217) 1-72 Submitted 11/16; Revised 1/17; Published 1/17 A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationNon Linear Latent Variable Models
Non Linear Latent Variable Models Neil Lawrence GPRS 14th February 2014 Outline Nonlinear Latent Variable Models Extensions Outline Nonlinear Latent Variable Models Extensions Non-Linear Latent Variable
More informationThe Automatic Statistician
The Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ James Lloyd, David Duvenaud (Cambridge) and
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationSparse Spectral Sampling Gaussian Processes
Sparse Spectral Sampling Gaussian Processes Miguel Lázaro-Gredilla Department of Signal Processing & Communications Universidad Carlos III de Madrid, Spain miguel@tsc.uc3m.es Joaquin Quiñonero-Candela
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationGeneralized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression
Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression Haitao Liu 1 Jianfei Cai 2 Yi Wang 3 Yew-Soon Ong 2 4 Abstract In order to scale standard Gaussian process (GP)
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationHow to build an automatic statistician
How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationOn Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes
On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes Alexander G. de G. Matthews 1, James Hensman 2, Richard E. Turner 1, Zoubin Ghahramani 1 1 University of Cambridge,
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationarxiv: v1 [stat.ml] 7 Nov 2014
James Hensman Alex Matthews Zoubin Ghahramani University of Sheffield University of Cambridge University of Cambridge arxiv:1411.2005v1 [stat.ml] 7 Nov 2014 Abstract Gaussian process classification is
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationDeep Neural Networks as Gaussian Processes
Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationState Space Gaussian Processes with Non-Gaussian Likelihoods
State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2,3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationWhat, exactly, is a cluster? - Bernhard Schölkopf, personal communication
Chapter 1 Warped Mixture Models What, exactly, is a cluster? - Bernhard Schölkopf, personal communication Previous chapters showed how the probabilistic nature of GPs sometimes allows the automatic determination
More informationLatent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationDeep Gaussian Processes for Regression using Approximate Expectation Propagation
Deep Gaussian Processes for Regression using Approximate Expectation Propagation Thang D. Bui 1 José Miguel Hernández-Lobato 2 Daniel Hernández-Lobato 3 Yingzhen Li 1 Richard E. Turner 1 1 University of
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationVariational Fourier Features for Gaussian Processes
Journal of Machine Learning Research 18 (18) 1-5 Submitted 11/16; Revised 11/17; Published 4/18 Variational Fourier Features for Gaussian Processes James Hensman Nicolas Durrande PROWLER.io 66-68 Hills
More informationDynamic Probabilistic Models for Latent Feature Propagation in Social Networks
Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data
More informationBristol Machine Learning Reading Group
Bristol Machine Learning Reading Group Introduction to Variational Inference Carl Henrik Ek - carlhenrik.ek@bristol.ac.uk November 25, 2016 http://www.carlhenrik.com Introduction Ronald Aylmer Fisher 1
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationNonparametric Bayesian inference on multivariate exponential families
Nonparametric Bayesian inference on multivariate exponential families William Vega-Brown, Marek Doniec, and Nicholas Roy Massachusetts Institute of Technology Cambridge, MA 2139 {wrvb, doniec, nickroy}@csail.mit.edu
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández Lobato 1,2, David López Paz 3,2 and Zoubin Ghahramani 1 June 27, 2013 1 University of Cambridge 2 Equal Contributor 3 Ma-Planck-Institute
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationSparse Variational Inference for Generalized Gaussian Process Models
Sparse Variational Inference for Generalized Gaussian Process odels Rishit Sheth a Yuyang Wang b Roni Khardon a a Department of Computer Science, Tufts University, edford, A 55, USA b Amazon, 5 9th Ave
More informationSparse Spectrum Gaussian Process Regression
Journal of Machine Learning Research 11 (2010) 1865-1881 Submitted 2/08; Revised 2/10; Published 6/10 Sparse Spectrum Gaussian Process Regression Miguel Lázaro-Gredilla Departamento de Teoría de la Señal
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationGaussian Processes for Machine Learning (GPML) Toolbox
Journal of Machine Learning Research 11 (2010) 3011-3015 Submitted 8/10; Revised 9/10; Published 11/10 Gaussian Processes for Machine Learning (GPML) Toolbox Carl Edward Rasmussen Department of Engineering
More informationMCMC for Variationally Sparse Gaussian Processes
MCMC for Variationally Sparse Gaussian Processes James Hensman CHICAS, Lancaster University james.hensman@lancaster.ac.uk Maurizio Filippone EURECOM maurizio.filippone@eurecom.fr Alexander G. de G. Matthews
More information