Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference

Size: px
Start display at page:

Download "Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference"

Transcription

1 Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College London February 22, 2017

2 Gaussian Processes Marc Deisenroth February 22,

3 Problem Setting f(x) x Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Probabilistic regression problem Gaussian Processes Marc Deisenroth February 22,

4 Problem Setting 2 f(x) x Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Probabilistic regression problem Gaussian Processes Marc Deisenroth February 22,

5 Recap from CO-496: Bayesian Linear Regression Linear Regression Model: f pxq φpxq J w, y f pxq ` ɛ, w N `0, Σ p ɛ N `0, σn 2 Gaussian Processes Marc Deisenroth February 22,

6 Recap from CO-496: Bayesian Linear Regression Linear Regression Model: f pxq φpxq J w, y f pxq ` ɛ, w N `0, Σ p ɛ N `0, σn 2 Integrating out the parameters when predicting leads to a distribution over functions: ż pp f px q x, X, yq pp f px q x, wqppw X, yqdw N `µpx q, σ 2 px q Gaussian Processes Marc Deisenroth February 22,

7 Recap from CO-496: Bayesian Linear Regression Linear Regression Model: f pxq φpxq J w, y f pxq ` ɛ, w N `0, Σ p ɛ N `0, σn 2 Integrating out the parameters when predicting leads to a distribution over functions: ż pp f px q x, X, yq pp f px q x, wqppw X, yqdw N `µpx q, σ 2 px q µpx q φ J Σ p ΦpK ` σn 2 Iq 1 y σ 2 px q φ J Σ p φ φ J Σ p ΦpK ` σn 2 Iq 1 Φ J Σ p φ K Φ J Σ p Φ Gaussian Processes Marc Deisenroth February 22,

8 b Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,

9 b Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,

10 b y Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a x Gaussian Processes Marc Deisenroth February 22,

11 y Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn x Gaussian Processes Marc Deisenroth February 22,

12 b Sampling from the Posterior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,

13 b Sampling from the Posterior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,

14 b y Sampling from the Posterior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a x Gaussian Processes Marc Deisenroth February 22,

15 Fitting Nonlinear Functions Fit nonlinear functions using (Bayesian) linear regression: Linear combination of nonlinear features Example: Radial-basis-function (RBF) network nÿ f pxq w i φ i pxq, i 1 w i N `0, σp 2 where φ i pxq exp ` 1 2 px µ i qj px µ i q for given centers µ i Gaussian Processes Marc Deisenroth February 22,

16 f(x) Illustration: Fitting a Radial Basis Function Network φ i pxq exp ` 1 2 px µ i qj px µ i q x Place Gaussian-shaped basis functions φ i at 25 input locations µ i, linearly spaced in the interval r 5, 3s Gaussian Processes Marc Deisenroth February 22,

17 f(x) Samples from the RBF Prior f pxq 4 nÿ w i φ i pxq, i 1 ppwq N `0, I x Gaussian Processes Marc Deisenroth February 22,

18 f(x) Samples from the RBF Posterior nÿ f pxq w i φ i pxq, 4 i 1 ppw X, yq N `m N, S N x Gaussian Processes Marc Deisenroth February 22,

19 f(x) RBF Posterior x Gaussian Processes Marc Deisenroth February 22,

20 f(x) Limitations Feature engineering Finite number of features: x Above: Without basis functions on the right, we cannot express any variability of the function Ideally: Add more (infinitely many) basis functions Gaussian Processes Marc Deisenroth February 22,

21 Approach Instead of sampling parameters, which induce a distribution over functions, sample functions directly Make assumptions on the distribution of functions Intuition: function = infinitely long vector of function values Make assumptions on the distribution of function values Gaussian Processes Marc Deisenroth February 22,

22 Gaussian Process We will place a distribution pp f q on functions f Informally, a function can be considered an infinitely long vector of function values f r f 1, f 2, f 3,...s A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Gaussian Processes Marc Deisenroth February 22,

23 Gaussian Process We will place a distribution pp f q on functions f Informally, a function can be considered an infinitely long vector of function values f r f 1, f 2, f 3,...s A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Definition A Gaussian process (GP) is a collection of random variables f 1, f 2,..., any finite number of which is Gaussian distributed. Gaussian Processes Marc Deisenroth February 22,

24 Gaussian Process We will place a distribution pp f q on functions f Informally, a function can be considered an infinitely long vector of function values f r f 1, f 2, f 3,...s A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Definition A Gaussian process (GP) is a collection of random variables f 1, f 2,..., any finite number of which is Gaussian distributed. A Gaussian distribution is specified by a mean vector µ and a covariance matrix Σ A Gaussian process is specified by a mean function mp q and a covariance function (kernel) kp, q Gaussian Processes Marc Deisenroth February 22,

25 Covariance Function The covariance function (kernel) is symmetric and positive semi-definite It allows us to compute covariances between (unknown) function values by just looking at the corresponding inputs: Covr f px i q, f px j qs kpx i, x j q Gaussian Processes Marc Deisenroth February 22,

26 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σ 2 n, find a (posterior) distribution over functions pp f X, yq that explains the data Gaussian Processes Marc Deisenroth February 22,

27 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Gaussian Processes Marc Deisenroth February 22,

28 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Gaussian Processes Marc Deisenroth February 22,

29 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Likelihood (noise model): ppy f, Xq N ` f pxq, σ 2 n I Gaussian Processes Marc Deisenroth February 22,

30 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Likelihood (noise model): ppy f, Xq N ` f pxq, σ 2 n I Marginal likelihood (evidence): ppy Xq ş ppy f pxqqpp f Xqd f Gaussian Processes Marc Deisenroth February 22,

31 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Likelihood (noise model): ppy f, Xq N ` f pxq, σ 2 n I Marginal likelihood (evidence): ppy Xq ş ppy f pxqqpp f Xqd f Posterior: pp f y, Xq GPpm post, k post q Gaussian Processes Marc Deisenroth February 22,

32 Prior over Functions Treat a function as a long vector of function values: f r f 1, f 2,... s Look at a distribution over function values f i f px i q Gaussian Processes Marc Deisenroth February 22,

33 Prior over Functions Treat a function as a long vector of function values: f r f 1, f 2,... s Look at a distribution over function values f i f px i q Consider a finite number of N function values f and all other (infinitely many) function values f. Informally:» fi» pp f, f q N µ f fl, µ f Σ f f Σ f f Σ f f Σ f f where Σ f f P R mˆm and Σ f f P R Nˆm, m Ñ 8. Σ pi,jq f f Covr f px i q, f px j qs kpx i, x j q fi fl Gaussian Processes Marc Deisenroth February 22,

34 Prior over Functions Treat a function as a long vector of function values: f r f 1, f 2,... s Look at a distribution over function values f i f px i q Consider a finite number of N function values f and all other (infinitely many) function values f. Informally:» fi» pp f, f q N µ f fl, µ f Σ f f Σ f f Σ f f Σ f f where Σ f f P R mˆm and Σ f f P R Nˆm, m Ñ 8. Σ pi,jq f f Covr f px i q, f px j qs kpx i, x j q fi fl Key property: The marginal remains finite ż pp f q pp f, f qd f N ` µ f, Σ f f Gaussian Processes Marc Deisenroth February 22,

35 Training and Test Marginal In practice, we always have finite training and test inputs x train, x test. Define f : f test, f : f train. Gaussian Processes Marc Deisenroth February 22,

36 Training and Test Marginal In practice, we always have finite training and test inputs x train, x test. Define f : f test, f : f train. Then, we obtain the finite marginal ż pp f, f q «ff «ff µ f Σ f f Σ f pp f, f, f other qd f other N, µ Σ f Σ Gaussian Processes Marc Deisenroth February 22,

37 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): pp f X, yq ppy f, Xq pp f Xq ppy Xq Gaussian Processes Marc Deisenroth February 22,

38 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): ppy f, Xq pp f Xq pp f X, yq ppy Xq Using the properties of Gaussians, we obtain ppy f, Xq pp f Xq N `y f pxq, σn 2 I N ` f pxq mpxq, K K kpx, Xq Gaussian Processes Marc Deisenroth February 22,

39 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): ppy f, Xq pp f Xq pp f X, yq ppy Xq Using the properties of Gaussians, we obtain ppy f, Xq pp f Xq N `y f pxq, σn 2 I N ` f pxq mpxq, K ZN ` f pxq looooooooooooooooooooomooooooooooooooooooooon mpxq ` KpK ` σn 2 Iq 1 py mpxqq, K KpK ` σn 2 Iq 1 loooooooooooomoooooooooooon K K kpx, Xq posterior mean posterior covariance Gaussian Processes Marc Deisenroth February 22,

40 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): ppy f, Xq pp f Xq pp f X, yq ppy Xq Using the properties of Gaussians, we obtain ppy f, Xq pp f Xq N `y f pxq, σn 2 I N ` f pxq mpxq, K ZN ` f pxq looooooooooooooooooooomooooooooooooooooooooon mpxq ` KpK ` σn 2 Iq 1 py mpxqq, K KpK ` σn 2 Iq 1 loooooooooooomoooooooooooon K posterior mean posterior covariance K kpx, Xq Marginal likelihood: ż Z ppy Xq ppy f, Xq pp f Xq d f N `y mpxq, K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,

41 GP Predictions (1) y f pxq ` ɛ, ɛ N `0, σn 2 Objective: Find pp f px q X, yq for training data X, y and test inputs X. GP prior: pp f Xq N `mpxq, K Gaussian Likelihood: ppy f pxqq N ` f pxq, σ 2 n I Gaussian Processes Marc Deisenroth February 22,

42 GP Predictions (1) y f pxq ` ɛ, ɛ N `0, σn 2 Objective: Find pp f px q X, yq for training data X, y and test inputs X. GP prior: pp f Xq N `mpxq, K Gaussian Likelihood: ppy f pxqq N ` f pxq, σ 2 n I With f GP it follows that f, f are jointly Gaussian distributed: «ff «ff mpxq K kpx, X q pp f, f X, X q N, mpx q kpx, Xq kpx, X q Gaussian Processes Marc Deisenroth February 22,

43 GP Predictions (1) y f pxq ` ɛ, ɛ N `0, σn 2 Objective: Find pp f px q X, yq for training data X, y and test inputs X. GP prior: pp f Xq N `mpxq, K Gaussian Likelihood: ppy f pxqq N ` f pxq, σ 2 n I With f GP it follows that f, f are jointly Gaussian distributed: «ff «ff mpxq K kpx, X q pp f, f X, X q N, mpx q kpx, Xq kpx, X q Due to the Gaussian likelihood, we also get ( f is unobserved) «ff «ff mpxq K `σn 2 I kpx, X q ppy, f X, X q N, mpx q kpx, Xq kpx, X q Gaussian Processes Marc Deisenroth February 22,

44 GP Predictions (2) Prior: ppy, f X, X q N ˆ j j mpxq K ` σ 2, n I kpx, X q mpx q kpx, Xq kpx, X q Posterior predictive distribution pp f X, y, X q at test inputs X Gaussian Processes Marc Deisenroth February 22,

45 GP Predictions (2) Prior: ppy, f X, X q N ˆ j j mpxq K ` σ 2, n I kpx, X q mpx q kpx, Xq kpx, X q Posterior predictive distribution pp f X, y, X q at test inputs X obtained by Gaussian conditioning: pp f X, y, X q N `Er f X, y, X s, Vr f X, y, X s Er f X, y, X s m post px q loomoon mpx q`kpx, XqpK ` σn 2 Iq 1 py mpxqq Vr f X, y, X s k post px, X q prior mean loooomoooon kpx, X q kpx, XqpK ` σn 2 Iq 1 kpx, X q prior variance Gaussian Processes Marc Deisenroth February 22,

46 GP Predictions (2) Prior: ppy, f X, X q N ˆ j j mpxq K ` σ 2, n I kpx, X q mpx q kpx, Xq kpx, X q Posterior predictive distribution pp f X, y, X q at test inputs X obtained by Gaussian conditioning: pp f X, y, X q N `Er f X, y, X s, Vr f X, y, X s Er f X, y, X s m post px q loomoon mpx q`kpx, XqpK ` σn 2 Iq 1 py mpxqq Vr f X, y, X s k post px, X q prior mean loooomoooon kpx, X q kpx, XqpK ` σn 2 Iq 1 kpx, X q prior variance From now: Set prior mean function m 0 Gaussian Processes Marc Deisenroth February 22,

47 Illustration: Inference with Gaussian Processes f(x) x Prior belief about the function Predictive (marginal) mean and variance: Er f px q x, s mpx q 0 Vr f px q x, s σ 2 px q kpx, x q Gaussian Processes Marc Deisenroth February 22,

48 Illustration: Inference with Gaussian Processes f(x) x Prior belief about the function Predictive (marginal) mean and variance: Er f px q x, s mpx q 0 Vr f px q x, s σ 2 px q kpx, x q Gaussian Processes Marc Deisenroth February 22,

49 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

50 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

51 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

52 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

53 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

54 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

55 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

56 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

57 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,

58 Covariance Function A Gaussian process is fully specified by a mean function m and a kernel/covariance function k The covariance function (kernel) is symmetric and positive semi-definite Covariance function encodes high-level structural assumptions about the latent function f (e.g., smoothness, differentiability, periodicity) Gaussian Processes Marc Deisenroth February 22,

59 Gaussian Covariance Function k Gauss px i, x j q σ 2 f exp ` px i x j q J px i x j q{l 2 σ f : Amplitude of the latent function l: Length scale. How far do we have to move in input space before the function value changes significantly Smoothness parameter Assumption on latent function: Smooth (8 differentiable) Gaussian Processes Marc Deisenroth February 22,

60 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,

61 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,

62 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,

63 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,

64 Matérn Covariance Function k Mat,3{2 px i, x j q σ 2 f 1 `? 3}xi x j } σ f : Amplitude of the latent function l exp? 3}xi x j } l: Length scale. How far do we have to move in input space before the function value changes significantly? l Assumption on latent function: 1-times differentiable Gaussian Processes Marc Deisenroth February 22,

65 Periodic Covariance Function k per px i, x j q σ 2 f exp 2 ` sin2 κpx i x j q 2π κ: Periodicity parameter l 2 k Gauss pupx i q, upx j qq, upxq j cospκxq sinpκxq Gaussian Processes Marc Deisenroth February 22,

66 Meta-Parameters of a GP The GP possesses a set of hyper-parameters: Parameters of the mean function Hyper-parameters of the covariance function (e.g., length-scales and signal variance) Likelihood parameters (e.g., noise variance σn) 2 Gaussian Processes Marc Deisenroth February 22,

67 Meta-Parameters of a GP The GP possesses a set of hyper-parameters: Parameters of the mean function Hyper-parameters of the covariance function (e.g., length-scales and signal variance) Likelihood parameters (e.g., noise variance σn) 2 Train a GP to find a good set of hyper-parameters Gaussian Processes Marc Deisenroth February 22,

68 Meta-Parameters of a GP The GP possesses a set of hyper-parameters: Parameters of the mean function Hyper-parameters of the covariance function (e.g., length-scales and signal variance) Likelihood parameters (e.g., noise variance σ 2 n) Train a GP to find a good set of hyper-parameters Model selection to find good mean and covariance functions (can also be automated Automatic Statistician (Lloyd et al., 2014)) Gaussian Processes Marc Deisenroth February 22,

69 Gaussian Process Training: Hyper-Parameters GP Training f θ Find good GP hyper-parameters θ (kernel and mean function parameters) x i y i N σ n Gaussian Processes Marc Deisenroth February 22,

70 Gaussian Process Training: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Place a prior ppθq on hyper-parameters Posterior over hyper-parameters: ż ppθq ppy X, θq ppθ X, yq, ppy X, θq ppy Xq x i y i f θ N σ n ppy f pxqqpp f X, θqd f Gaussian Processes Marc Deisenroth February 22,

71 Gaussian Process Training: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Place a prior ppθq on hyper-parameters Posterior over hyper-parameters: ż ppθq ppy X, θq ppθ X, yq, ppy X, θq ppy Xq Choose hyper-parameters θ, such that x i y i f θ N σ n ppy f pxqqpp f X, θqd f θ P arg max log ppθq ` log ppy X, θq θ Gaussian Processes Marc Deisenroth February 22,

72 Gaussian Process Training: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Place a prior ppθq on hyper-parameters Posterior over hyper-parameters: ż ppθq ppy X, θq ppθ X, yq, ppy X, θq ppy Xq Choose hyper-parameters θ, such that x i y i f θ N σ n ppy f pxqqpp f X, θqd f θ P arg max log ppθq ` log ppy X, θq θ Maximize marginal likelihood if ppθq U (uniform prior) Gaussian Processes Marc Deisenroth February 22,

73 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters, where the unwieldy f has been integrated out) Also called Maximum Likelihood-Type-II Marginal likelihood: ż ppy X, θq ppy f pxqqpp f X, θqd f ż N `y f pxq, σn 2 I N ` f pxq 0, K d f N `y 0, K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,

74 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters, where the unwieldy f has been integrated out) Also called Maximum Likelihood-Type-II Marginal likelihood: ż ppy X, θq ppy f pxqqpp f X, θqd f ż N `y f pxq, σn 2 I N ` f pxq 0, K d f N `y 0, K ` σn 2 I Learning the GP hyper-parameters: θ P arg max log ppy X, θq θ log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,

75 Training via Marginal Likelihood Maximization Log-marginal likelihood: log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,

76 Training via Marginal Likelihood Maximization Log-marginal likelihood: log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Automatic trade-off between data fit and model complexity Gaussian Processes Marc Deisenroth February 22,

77 Training via Marginal Likelihood Maximization Log-marginal likelihood: log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Automatic trade-off between data fit and model complexity Gradient-based optimization of hyper-parameters θ: B log ppy X, θq 1 BK θ Bθ 2yJK 1 K 1 θ i Bθ θ y 1 i 1 2 tr`pαα J K 1 θ qbk θ, Bθ i α : K 1 θ y tr`k 1 2 θ BK θ Bθ i Gaussian Processes Marc Deisenroth February 22,

78 y Example: Training Data x Gaussian Processes Marc Deisenroth February 22,

79 log-length-scales Example: Marginal Likelihood Contour 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

80 y Example: Exploring the Modes (1) x Gaussian Processes Marc Deisenroth February 22,

81 y Example: Exploring the Modes (2) x Gaussian Processes Marc Deisenroth February 22,

82 log-length-scales Marginal Likelihood (1) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

83 log-length-scales Marginal Likelihood (2) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

84 log-length-scales Marginal Likelihood (3) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

85 log-length-scales Marginal Likelihood (4) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

86 log-length-scales Marginal Likelihood (5) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

87 log-length-scales Marginal Likelihood (6) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

88 log-length-scales Marginal Likelihood (7) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

89 log-length-scales Marginal Likelihood (8) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

90 log-length-scales Marginal Likelihood (9) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,

91 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex Gaussian Processes Marc Deisenroth February 22,

92 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Gaussian Processes Marc Deisenroth February 22,

93 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Gaussian Processes Marc Deisenroth February 22,

94 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Gaussian Processes Marc Deisenroth February 22,

95 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Gaussian Processes Marc Deisenroth February 22,

96 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Re-start hyper-parameter optimization from random initialization to mitigate the problem Gaussian Processes Marc Deisenroth February 22,

97 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Re-start hyper-parameter optimization from random initialization to mitigate the problem With increasing data set size the GP typically ends up in the good-fit mode. Overfitting (indicator: small length-scales and small noise variance) is very unlikely. Gaussian Processes Marc Deisenroth February 22,

98 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Re-start hyper-parameter optimization from random initialization to mitigate the problem With increasing data set size the GP typically ends up in the good-fit mode. Overfitting (indicator: small length-scales and small noise variance) is very unlikely. Ideally, we would integrate the hyper-parameters out Why can we do not do this easily? Gaussian Processes Marc Deisenroth February 22,

99 Model Selection Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Gaussian Processes Marc Deisenroth February 22,

100 Model Selection Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Some options: BIC, AIC (see CO-496) Compare marginal likelihood values (assuming a uniform prior on the set of models) Gaussian Processes Marc Deisenroth February 22,

101 f(x) Example x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,

102 f(x) Example 3 Constant kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,

103 f(x) Example 3 Linear kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,

104 f(x) Example 3 Matern kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,

105 f(x) Example 3 Gaussian kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,

106 Application Areas 5 8 ang.vel. in rad/s angle in rad 2 Reinforcement learning and robotics Model value functions and/or dynamics with GPs Bayesian optimization (Experimental Design) Model unknown utility functions with GPs Geostatistics Spatial modeling (e.g., landscapes, resources) Sensor networks Time-series modeling and forecasting Gaussian Processes Marc Deisenroth February 22,

107 Limitations of Gaussian Processes Computational and memory complexity Training set size: N Training scales in OpN 3 q Prediction (variances) scales in OpN 2 q Memory requirement: OpND ` N 2 q Practical limit N «10, 000 Gaussian Processes Marc Deisenroth February 22,

108 Tips and Tricks for Practitioners To set initial hyper-parameters, use domain knowledge if possible. Standardize input data and set initial length-scales l to «0.5. Standardize targets y and set initial signal variance to σ f «1. Often useful: Set initial noise level relatively high (e.g., σ n «0.5 ˆ σ f amplitude, even if you think your data have low noise. The optimization surface for your other parameters will be easier to move in. When optimizing hyper-parameters, try random restarts or other tricks to avoid local optima are advised. Mitigate the problem of numerical instability (Cholesky decomposition of K ` σn 2 I) by penalizing high signal-to-noise ratios σ f {σ n Gaussian Processes Marc Deisenroth February 22,

109 Appendix Gaussian Processes Marc Deisenroth February 22,

110 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound p(x) x 8 6 x Gaussian Processes Marc Deisenroth February 22, y p(x, y)

111 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound 3 2 Mean 95% confidence bound p(x) x x x 1 Gaussian Processes Marc Deisenroth February 22,

112 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 Data p(x) Mean 95% confidence interval 3 2 Data Mean 95% confidence bound p(x) x x x 1 Gaussian Processes Marc Deisenroth February 22,

113 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Gaussian Processes Marc Deisenroth February 22,

114 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Exploit that affine transformations y Ax ` b of a Gaussian random variable x remain Gaussian Mean: E x rax ` bs AE x rxs ` b Covariance: V x rax ` bs AV x rxsa J Gaussian Processes Marc Deisenroth February 22,

115 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Exploit that affine transformations y Ax ` b of a Gaussian random variable x remain Gaussian Mean: E x rax ` bs AE x rxs ` b Covariance: V x rax ` bs AV x rxsa J 1. Find conditions for A, b to match the mean of y 2. Find conditions for A, b to match the covariance of y Gaussian Processes Marc Deisenroth February 22,

116 Sampling from a Multivariate Gaussian (2) Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. x = randn(d,1); y = chol(σ) *x + µ; Sample x N `0, I Scale x and add offset Here chol(σ) is the Cholesky factor L, such that L J L Σ Gaussian Processes Marc Deisenroth February 22,

117 Sampling from a Multivariate Gaussian (2) Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. x = randn(d,1); y = chol(σ) *x + µ; Sample x N `0, I Scale x and add offset Here chol(σ) is the Cholesky factor L, such that L J L Σ Therefore, the mean and covariance of y are Erys ȳ ErL J x ` µs L J Erxs ` µ µ Covrys Erpy ȳqpy ȳq J s ErL J xx J Ls L J Erxx J sl L J L Σ Gaussian Processes Marc Deisenroth February 22,

118 Conditional 3 2 «ff µx Joint p(x,y) ppx, yq N, µ y «ff Σxx Σ xy Σ yx Σ yy 1 0 y x Gaussian Processes Marc Deisenroth February 22,

119 Conditional Joint p(x,y) Observation ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y x Gaussian Processes Marc Deisenroth February 22,

120 Conditional 3 2 Joint p(x,y) Observation y Conditional p(x y) ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y ppx yq N `µ x y, Σ x y µ x y µ x ` Σ xy Σ 1 yy py µ y q Σ x y Σ xx Σ xy Σ 1 yy Σ yx x Conditional ppx yq is also Gaussian Computationally convenient Gaussian Processes Marc Deisenroth February 22,

121 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx Gaussian Processes Marc Deisenroth February 22,

122 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx The marginal of a joint Gaussian distribution is Gaussian Intuitively: Ignore (integrate out) everything you are not interested in Gaussian Processes Marc Deisenroth February 22, Σ yy

123 The Gaussian Distribution in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 are random variables. Gaussian Processes Marc Deisenroth February 22,

124 The Gaussian Distribution in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 are random variables. Then «ff j µx Σxx Σ ppx, xq N, x x µ x Σ xx where Σ x x P R kˆk and Σ x x P R Dˆk, k Ñ 8. Σ x x Gaussian Processes Marc Deisenroth February 22,

125 The Gaussian Distribution in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 are random variables. Then «ff j µx Σxx Σ ppx, xq N, x x µ x Σ xx Σ x x where Σ x x P R kˆk and Σ x x P R Dˆk, k Ñ 8. However, the marginal remains finite ż pp x q pp x, x qd x N ` µ x, Σ xx where we integrate out an infinite number of random variables x i. Gaussian Processes Marc Deisenroth February 22,

126 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Gaussian Processes Marc Deisenroth February 22,

127 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide) Gaussian Processes Marc Deisenroth February 22,

128 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other fi ffi ffi fl Gaussian Processes Marc Deisenroth February 22,

129 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other ż ppx train, x test q pp x train, x test, x other qd x other fi ffi ffi fl Gaussian Processes Marc Deisenroth February 22,

130 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other ż ppx train, x test q pp x train, x test, x other qd x other ppx test x train q N `µ, Σ µ µ test ` Σ test,train Σ 1 train px train µ train q Σ Σ test Σ test,train Σ 1 train Σ train,test fi ffi ffi fl Gaussian Processes Marc Deisenroth February 22,

131 Gaussian Process Training: Hierarchical Inference Level-1 inference (posterior on f ): ppy X, f q pp f X, θq pp f X, y, θq ppy X, θq ż ppy X, θq ppy f, Xq pp f X, f θqd f f θ x i y i N σ n Gaussian Processes Marc Deisenroth February 22,

132 Gaussian Process Training: Hierarchical Inference Level-1 inference (posterior on f ): ppy X, f q pp f X, θq pp f X, y, θq ppy X, θq ż ppy X, θq ppy f, Xq pp f X, f θqd f f θ Level-2 inference (posterior on θ) ppθ X, yq ppy X, θq ppθq ppy Xq x i y i N σ n Gaussian Processes Marc Deisenroth February 22,

133 GP as the Limit of an Infinite RBF Network Consider the universal function approximator f pxq ÿ 1 Nÿ n pi ` lim γ n exp px NÑ8 N λ 2, ipz n 1 x P R, λ P R` with γ n N `0, 1 (random weights) Gaussian-shaped basis functions (with variance λ 2 {2) everywhere on the real axis Gaussian Processes Marc Deisenroth February 22,

134 GP as the Limit of an Infinite RBF Network Consider the universal function approximator f pxq ÿ 1 Nÿ n pi ` lim γ n exp px NÑ8 N λ 2, ipz n 1 x P R, λ P R` with γ n N `0, 1 (random weights) Gaussian-shaped basis functions (with variance λ 2 {2) everywhere on the real axis f pxq ÿ ipz ż i`1 i ˆ px sq2 γpsq exp λ 2 ż 8 ds γpsq exp 8 ˆ px sq2 λ 2 ds Gaussian Processes Marc Deisenroth February 22,

135 GP as the Limit of an Infinite RBF Network Consider the universal function approximator f pxq ÿ 1 Nÿ n pi ` lim γ n exp px NÑ8 N λ 2, ipz n 1 x P R, λ P R` with γ n N `0, 1 (random weights) Gaussian-shaped basis functions (with variance λ 2 {2) everywhere on the real axis f pxq ÿ ipz ż i`1 i ˆ px sq2 γpsq exp λ 2 ż 8 ds γpsq exp 8 ˆ px sq2 λ 2 ds Mean: Er f pxqs 0 Covariance: Covr f pxq, f px 1 qs θ 2 1 exp px x1 q 2 2λ 2 GP with mean 0 and Gaussian covariance function for suitable θ 2 1 Gaussian Processes Marc Deisenroth February 22,

136 References I [1] N. A. C. Cressie. Statistics for Spatial Data. Wiley-Interscience, [2] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems. In Advances in Neural Information Processing Systems, pages , [3] M. P. Deisenroth and J. W. Ng. Distributed Gaussian Processes. In Proceedings of the International Conference on Machine Learning, [4] M. P. Deisenroth, C. E. Rasmussen, and J. Peters. Gaussian Process Dynamic Programming. Neurocomputing, 72(7 9): , March [5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing with Gaussian Processes. IEEE Transactions on Automatic Control, 57(7): , [6] R. Frigola, F. Lindsten, T. B. Schön, and C. E. Rasmussen. Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, pages Curran Associates, Inc., [7] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard. Gaussian Process Model Based Predictive Control. In Proceedings of the 2004 American Control Conference (ACC 2004), pages , Boston, MA, USA, June July [8] A. Krause, A. Singh, and C. Guestrin. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. Journal of Machine Learning Research, 9: , February [9] J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In AAAI Conference on Artificial Intelligence, pages 1 11, [10] M. A. Osborne, S. J. Roberts, A. Rogers, S. D. Ramchurn, and N. R. Jennings. Towards Real-Time Information Processing of Sensor Network Data Using Computationally Efficient Multi-output Gaussian Processes. In Proceedings of the International Conference on Information Processing in Sensor Networks, pages IEEE Computer Society, [11] J. Quiñonero-Candela and C. E. Rasmussen. A Unifying View of Sparse Approximate Gaussian Process Regression. Journal of Machine Learning Research, 6(2): , [12] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA, [13] S. Roberts, M. A. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. Gaussian Processes for Time Series Modelling. Philosophical Transactions of the Royal Society (Part A), 371(1984), February Gaussian Processes Marc Deisenroth February 22,

Lecture 11: Probability Distributions and Parameter Estimation

Lecture 11: Probability Distributions and Parameter Estimation Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and

More information

Distributed Gaussian Processes

Distributed Gaussian Processes Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Analytic Long-Term Forecasting with Periodic Gaussian Processes

Analytic Long-Term Forecasting with Periodic Gaussian Processes Nooshin Haji Ghassemi School of Computing Blekinge Institute of Technology Sweden Marc Peter Deisenroth Department of Computing Imperial College London United Kingdom Department of Computer Science TU

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM Roger Frigola Fredrik Lindsten Thomas B. Schön, Carl E. Rasmussen Dept. of Engineering, University of Cambridge,

More information

How to build an automatic statistician

How to build an automatic statistician How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM Preprints of the 9th World Congress The International Federation of Automatic Control Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM Roger Frigola Fredrik

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Talk on Bayesian Optimization

Talk on Bayesian Optimization Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,

More information

Building an Automatic Statistician

Building an Automatic Statistician Building an Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ ALT-Discovery Science Conference, October

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Quantifying mismatch in Bayesian optimization

Quantifying mismatch in Bayesian optimization Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

The Automatic Statistician

The Automatic Statistician The Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ James Lloyd, David Duvenaud (Cambridge) and

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes 1 Objectives to express prior knowledge/beliefs about model outputs using Gaussian process (GP) to sample functions from the probability measure defined by GP to build

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

State-Space Inference and Learning with Gaussian Processes

State-Space Inference and Learning with Gaussian Processes Ryan Turner 1 Marc Peter Deisenroth 1, Carl Edward Rasmussen 1,3 1 Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB 1PZ, UK Department of Computer Science & Engineering,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

MODEL BASED LEARNING OF SIGMA POINTS IN UNSCENTED KALMAN FILTERING. Ryan Turner and Carl Edward Rasmussen

MODEL BASED LEARNING OF SIGMA POINTS IN UNSCENTED KALMAN FILTERING. Ryan Turner and Carl Edward Rasmussen MODEL BASED LEARNING OF SIGMA POINTS IN UNSCENTED KALMAN FILTERING Ryan Turner and Carl Edward Rasmussen University of Cambridge Department of Engineering Trumpington Street, Cambridge CB PZ, UK ABSTRACT

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Lecture 5 September 19

Lecture 5 September 19 IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 2 1/ 38 Outline of the lecture STK-IN4300 - Statistical Learning Methods in Data Science Linear

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

Predictive Variance Reduction Search

Predictive Variance Reduction Search Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information