Gaussian Processes for Big Data. James Hensman

Size: px

Start display at page:

Download "Gaussian Processes for Big Data. James Hensman"

Oswald Shelton
5 years ago
Views:

1 Gaussian Processes for Big Data James Hensman

2 Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

3 Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

4 Motivation Inference in a GP has the following demands: Complexity: O(n 3 ) Storage: O(n 2 ) Inference in a sparse GP has the following demands: Complexity: O(nm 2 ) Storage: O(nm) where we get to pick m!

5 Still not good enough! Big Data In parametric models, stochastic optimisation is used. This allows for application to Big Data. This work Show how to use Stochastic Variational Inference in GPs Stochastic optimisation scheme: each step requires O(m 3 )

6 Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

7 Computational savings K nn Q nn = K nm K 1 mmk mn Instead of inverting K nn, we make a low rank (or Nyström) approximation, and invert K mm instead.

8 Information capture Everything we want to do with a GP involves marginalising f Predictions Marginal likelihood Estimating covariance parameters The posterior of f is the central object. This means inverting K nn.

9 X, y function values input space (X)

10 X, y f (x) GP function values input space (X)

11 X, y f (x) GP function values p(f) = N (0, K nn ) input space (X)

12 X, y f (x) GP p(f) = N (0, K nn ) function values p(f y, X) input space (X)

13 Introducing u Take and extra M points on the function, u = f (Z). p(y, f, u) = p(y f)p(f u)p(u)

14 Introducing u

15 Introducing u Take and extra M points on the function, u = f (Z). p(y, f, u) = p(y f)p(f u)p(u) p(y f) = N ( y f, σ 2 I ) p(f u) = N ( f K nm K mm ıu, K ) p(u) = N (u 0, K mm )

16 X, y f (x) GP function values p(f) = N (0, K nn ) p(f y, X) Z, u p(u) = N (0, K mm ) input space (X)

17 X, y f (x) GP p(f) = N (0, K nn ) p(f y, X) p(u) = N (0, K mm ) function values p(u y, X) input space (X)

18 The alternative posterior Instead of doing p(f y, X) = p(y f)p(f X) p(y f)p(f X)df We ll do p(u y, Z) = p(y u)p(u Z) p(y u)p(u Z)du

19 The alternative posterior Instead of doing p(f y, X) = p(y f)p(f X) p(y f)p(f X)df We ll do p(u y, Z) = p(y u)p(u Z) p(y u)p(u Z)du but p(y u) involves inverting K nn

20 Variational marginalisation of f p(y u) = p(y f)p(f u) p(f y, u)

21 Variational marginalisation of f p(y u) = p(y f)p(f u) p(f y, u) ln p(y u) = ln p(y f) + ln p(f u) p(f y, u)

22 Variational marginalisation of f p(y u) = p(y f)p(f u) p(f y, u) ln p(y u) = ln p(y f) + ln p(f u) p(f y, u) [ ] [ p(f u) ] ln p(y u) = E p(f u) ln p(y f) + Ep(f u) ln p(f y, u)

23 Variational marginalisation of f p(y u) = p(y f)p(f u) p(f y, u) ln p(y u) = ln p(y f) + ln p(f u) p(f y, u) [ ] [ p(f u) ] ln p(y u) = E p(f u) ln p(y f) + Ep(f u) ln p(f y, u) ln p(y u) = p(y u) + KL[p(f u) p(f y, u)] No inversion of K nn required

24 An approximate likelihood p(y u) = n i=1 N ( y i k mnk 1 mmu, σ 2) exp { 1 2σ 2 ( knn k mnk 1 mmk mn )} A straightforward likelihood approximation, and a penalty term

25 Now we can marginalise u p(u y, Z) = p(y u)p(u Z) p(y u)p(u Z)du Computing the (approximate) posterior costs O(nm 2 ) We also get a lower bound of the marginal likelihood This is the standard variational sparse GP [Titsias, 2009].

26 Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

27 Variational Bayes Approximate the true posterior distribution with a simpler one. Usually assume factorisation in the approximation Iterative update procedure (like EM) Can be seen as a coordinate-wise steepest ascent method

28 Stochastic Variational Inference Combine the ideas of stochastic optimisation with Variational inference example: apply Latent Dirichlet allocation to project Gutenberg Can apply variational techniques to Big Data How could this work in GPs?

29 Maintain the factorisation! The variational marginalisation of f introduced factorisation across the datapoints (conditioned on u) Marginalising u re-introdcuced dependencies between the data Solution: a variational treatment of u

30 log p(y X) L 1 + log p(u) log q(u) q(u) L 3. (1) L 3 = n i=1 { log N ( y i k mnk 1 mmm, β 1) 1 2 β k i,i 1 } 2 tr (SΛ i) KL ( q(u) p(u) ) (2)

31 Optimisation The variational objective L 3 is a function of the parameters of the covariance function the parameters of q(u) the inducing inputs, Z Strategy: set Z. Take the data in small minibatches, take stochastic gradient steps in the covariance function parameters, stochastic natural gradient steps in the parameters of q(u).

32 Natural Gradients g(θ) = G(θ) 1 L 3 θ = L 3 η. θ 2(t+1) = 1 2 Sı (t+1) = 1 2 Sı (t) + l θ 1(t+1) = Sı (t+1) m (t+1) ( 1 2 Λ + 1 ) 2 Sı (t), = Sı (t) m (t) + l ( βk mm ık mn y Sı (t) m (t) ),

33 Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

35 UK apartment prices Monthly price paid data for February to October 2012 (England and Wales) from land-registry-monthly-price-paid-data/ 75,000 entries Cross referenced against a postcode database to get lattitude and longitude Regressed the normalised logarithm of the apartment prices

38 Airline data Flight delays for every commercial flight in the USA from January to April Average delay was 30 minutes. We randomly selected 800,000 datapoints (we have limited memory!) Inverse lengthscale Month DayOfMonth DayOfWeek DepTime ArrTime AirTime Distance PlaneAge 700,000 train, 100,000 test

39 37 GPs on subsets 37 SVI GP RMSE N=800 N=1000 N= iteration

41 Download the code! github.com/sheffieldml/gpy Cite our paper! Hensman, Fusi and Lawrence, Gaussian Processes for Big Data Proceedings of UAI 2013

42 Michalis K. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In David van Dyk and Max Welling, editors, Proceedings of the Twelfth International Workshop on Artificial Intelligence and Statistics, volume 5, pages , Clearwater Beach, FL, April JMLR W&CP 5.

Deep Gaussian Processes

Deep Gaussian Processes Neil D. Lawrence 8th April 2015 Mascot Num 2015 Outline Introduction Deep Gaussian Process Models Variational Methods Composition of GPs Results Outline Introduction Deep Gaussian