Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference
|
|
- Marian Thompson
- 5 years ago
- Views:
Transcription
1 Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College London February 22, 2017
2 Gaussian Processes Marc Deisenroth February 22,
3 Problem Setting f(x) x Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Probabilistic regression problem Gaussian Processes Marc Deisenroth February 22,
4 Problem Setting 2 f(x) x Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Probabilistic regression problem Gaussian Processes Marc Deisenroth February 22,
5 Recap from CO-496: Bayesian Linear Regression Linear Regression Model: f pxq φpxq J w, y f pxq ` ɛ, w N `0, Σ p ɛ N `0, σn 2 Gaussian Processes Marc Deisenroth February 22,
6 Recap from CO-496: Bayesian Linear Regression Linear Regression Model: f pxq φpxq J w, y f pxq ` ɛ, w N `0, Σ p ɛ N `0, σn 2 Integrating out the parameters when predicting leads to a distribution over functions: ż pp f px q x, X, yq pp f px q x, wqppw X, yqdw N `µpx q, σ 2 px q Gaussian Processes Marc Deisenroth February 22,
7 Recap from CO-496: Bayesian Linear Regression Linear Regression Model: f pxq φpxq J w, y f pxq ` ɛ, w N `0, Σ p ɛ N `0, σn 2 Integrating out the parameters when predicting leads to a distribution over functions: ż pp f px q x, X, yq pp f px q x, wqppw X, yqdw N `µpx q, σ 2 px q µpx q φ J Σ p ΦpK ` σn 2 Iq 1 y σ 2 px q φ J Σ p φ φ J Σ p ΦpK ` σn 2 Iq 1 Φ J Σ p φ K Φ J Σ p Φ Gaussian Processes Marc Deisenroth February 22,
8 b Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,
9 b Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,
10 b y Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a x Gaussian Processes Marc Deisenroth February 22,
11 y Sampling from the Prior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn x Gaussian Processes Marc Deisenroth February 22,
12 b Sampling from the Posterior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,
13 b Sampling from the Posterior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a Gaussian Processes Marc Deisenroth February 22,
14 b y Sampling from the Posterior over Functions Consider a linear regression setting y a ` bx ` ɛ, ppa, bq N `0, I ɛ N `0, σn a x Gaussian Processes Marc Deisenroth February 22,
15 Fitting Nonlinear Functions Fit nonlinear functions using (Bayesian) linear regression: Linear combination of nonlinear features Example: Radial-basis-function (RBF) network nÿ f pxq w i φ i pxq, i 1 w i N `0, σp 2 where φ i pxq exp ` 1 2 px µ i qj px µ i q for given centers µ i Gaussian Processes Marc Deisenroth February 22,
16 f(x) Illustration: Fitting a Radial Basis Function Network φ i pxq exp ` 1 2 px µ i qj px µ i q x Place Gaussian-shaped basis functions φ i at 25 input locations µ i, linearly spaced in the interval r 5, 3s Gaussian Processes Marc Deisenroth February 22,
17 f(x) Samples from the RBF Prior f pxq 4 nÿ w i φ i pxq, i 1 ppwq N `0, I x Gaussian Processes Marc Deisenroth February 22,
18 f(x) Samples from the RBF Posterior nÿ f pxq w i φ i pxq, 4 i 1 ppw X, yq N `m N, S N x Gaussian Processes Marc Deisenroth February 22,
19 f(x) RBF Posterior x Gaussian Processes Marc Deisenroth February 22,
20 f(x) Limitations Feature engineering Finite number of features: x Above: Without basis functions on the right, we cannot express any variability of the function Ideally: Add more (infinitely many) basis functions Gaussian Processes Marc Deisenroth February 22,
21 Approach Instead of sampling parameters, which induce a distribution over functions, sample functions directly Make assumptions on the distribution of functions Intuition: function = infinitely long vector of function values Make assumptions on the distribution of function values Gaussian Processes Marc Deisenroth February 22,
22 Gaussian Process We will place a distribution pp f q on functions f Informally, a function can be considered an infinitely long vector of function values f r f 1, f 2, f 3,...s A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Gaussian Processes Marc Deisenroth February 22,
23 Gaussian Process We will place a distribution pp f q on functions f Informally, a function can be considered an infinitely long vector of function values f r f 1, f 2, f 3,...s A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Definition A Gaussian process (GP) is a collection of random variables f 1, f 2,..., any finite number of which is Gaussian distributed. Gaussian Processes Marc Deisenroth February 22,
24 Gaussian Process We will place a distribution pp f q on functions f Informally, a function can be considered an infinitely long vector of function values f r f 1, f 2, f 3,...s A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Definition A Gaussian process (GP) is a collection of random variables f 1, f 2,..., any finite number of which is Gaussian distributed. A Gaussian distribution is specified by a mean vector µ and a covariance matrix Σ A Gaussian process is specified by a mean function mp q and a covariance function (kernel) kp, q Gaussian Processes Marc Deisenroth February 22,
25 Covariance Function The covariance function (kernel) is symmetric and positive semi-definite It allows us to compute covariances between (unknown) function values by just looking at the corresponding inputs: Covr f px i q, f px j qs kpx i, x j q Gaussian Processes Marc Deisenroth February 22,
26 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σ 2 n, find a (posterior) distribution over functions pp f X, yq that explains the data Gaussian Processes Marc Deisenroth February 22,
27 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Gaussian Processes Marc Deisenroth February 22,
28 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Gaussian Processes Marc Deisenroth February 22,
29 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Likelihood (noise model): ppy f, Xq N ` f pxq, σ 2 n I Gaussian Processes Marc Deisenroth February 22,
30 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Likelihood (noise model): ppy f, Xq N ` f pxq, σ 2 n I Marginal likelihood (evidence): ppy Xq ş ppy f pxqqpp f Xqd f Gaussian Processes Marc Deisenroth February 22,
31 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ɛ, ɛ N `0, σn, 2 find a (posterior) distribution over functions pp f X, yq that explains the data Training data: X, y. Bayes theorem yields pp f X, yq ppy f, Xq pp f q ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k. Likelihood (noise model): ppy f, Xq N ` f pxq, σ 2 n I Marginal likelihood (evidence): ppy Xq ş ppy f pxqqpp f Xqd f Posterior: pp f y, Xq GPpm post, k post q Gaussian Processes Marc Deisenroth February 22,
32 Prior over Functions Treat a function as a long vector of function values: f r f 1, f 2,... s Look at a distribution over function values f i f px i q Gaussian Processes Marc Deisenroth February 22,
33 Prior over Functions Treat a function as a long vector of function values: f r f 1, f 2,... s Look at a distribution over function values f i f px i q Consider a finite number of N function values f and all other (infinitely many) function values f. Informally:» fi» pp f, f q N µ f fl, µ f Σ f f Σ f f Σ f f Σ f f where Σ f f P R mˆm and Σ f f P R Nˆm, m Ñ 8. Σ pi,jq f f Covr f px i q, f px j qs kpx i, x j q fi fl Gaussian Processes Marc Deisenroth February 22,
34 Prior over Functions Treat a function as a long vector of function values: f r f 1, f 2,... s Look at a distribution over function values f i f px i q Consider a finite number of N function values f and all other (infinitely many) function values f. Informally:» fi» pp f, f q N µ f fl, µ f Σ f f Σ f f Σ f f Σ f f where Σ f f P R mˆm and Σ f f P R Nˆm, m Ñ 8. Σ pi,jq f f Covr f px i q, f px j qs kpx i, x j q fi fl Key property: The marginal remains finite ż pp f q pp f, f qd f N ` µ f, Σ f f Gaussian Processes Marc Deisenroth February 22,
35 Training and Test Marginal In practice, we always have finite training and test inputs x train, x test. Define f : f test, f : f train. Gaussian Processes Marc Deisenroth February 22,
36 Training and Test Marginal In practice, we always have finite training and test inputs x train, x test. Define f : f test, f : f train. Then, we obtain the finite marginal ż pp f, f q «ff «ff µ f Σ f f Σ f pp f, f, f other qd f other N, µ Σ f Σ Gaussian Processes Marc Deisenroth February 22,
37 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): pp f X, yq ppy f, Xq pp f Xq ppy Xq Gaussian Processes Marc Deisenroth February 22,
38 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): ppy f, Xq pp f Xq pp f X, yq ppy Xq Using the properties of Gaussians, we obtain ppy f, Xq pp f Xq N `y f pxq, σn 2 I N ` f pxq mpxq, K K kpx, Xq Gaussian Processes Marc Deisenroth February 22,
39 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): ppy f, Xq pp f Xq pp f X, yq ppy Xq Using the properties of Gaussians, we obtain ppy f, Xq pp f Xq N `y f pxq, σn 2 I N ` f pxq mpxq, K ZN ` f pxq looooooooooooooooooooomooooooooooooooooooooon mpxq ` KpK ` σn 2 Iq 1 py mpxqq, K KpK ` σn 2 Iq 1 loooooooooooomoooooooooooon K K kpx, Xq posterior mean posterior covariance Gaussian Processes Marc Deisenroth February 22,
40 GP Regression as a Bayesian Inference Problem (ctd.) Posterior over functions (with training data X, y): ppy f, Xq pp f Xq pp f X, yq ppy Xq Using the properties of Gaussians, we obtain ppy f, Xq pp f Xq N `y f pxq, σn 2 I N ` f pxq mpxq, K ZN ` f pxq looooooooooooooooooooomooooooooooooooooooooon mpxq ` KpK ` σn 2 Iq 1 py mpxqq, K KpK ` σn 2 Iq 1 loooooooooooomoooooooooooon K posterior mean posterior covariance K kpx, Xq Marginal likelihood: ż Z ppy Xq ppy f, Xq pp f Xq d f N `y mpxq, K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,
41 GP Predictions (1) y f pxq ` ɛ, ɛ N `0, σn 2 Objective: Find pp f px q X, yq for training data X, y and test inputs X. GP prior: pp f Xq N `mpxq, K Gaussian Likelihood: ppy f pxqq N ` f pxq, σ 2 n I Gaussian Processes Marc Deisenroth February 22,
42 GP Predictions (1) y f pxq ` ɛ, ɛ N `0, σn 2 Objective: Find pp f px q X, yq for training data X, y and test inputs X. GP prior: pp f Xq N `mpxq, K Gaussian Likelihood: ppy f pxqq N ` f pxq, σ 2 n I With f GP it follows that f, f are jointly Gaussian distributed: «ff «ff mpxq K kpx, X q pp f, f X, X q N, mpx q kpx, Xq kpx, X q Gaussian Processes Marc Deisenroth February 22,
43 GP Predictions (1) y f pxq ` ɛ, ɛ N `0, σn 2 Objective: Find pp f px q X, yq for training data X, y and test inputs X. GP prior: pp f Xq N `mpxq, K Gaussian Likelihood: ppy f pxqq N ` f pxq, σ 2 n I With f GP it follows that f, f are jointly Gaussian distributed: «ff «ff mpxq K kpx, X q pp f, f X, X q N, mpx q kpx, Xq kpx, X q Due to the Gaussian likelihood, we also get ( f is unobserved) «ff «ff mpxq K `σn 2 I kpx, X q ppy, f X, X q N, mpx q kpx, Xq kpx, X q Gaussian Processes Marc Deisenroth February 22,
44 GP Predictions (2) Prior: ppy, f X, X q N ˆ j j mpxq K ` σ 2, n I kpx, X q mpx q kpx, Xq kpx, X q Posterior predictive distribution pp f X, y, X q at test inputs X Gaussian Processes Marc Deisenroth February 22,
45 GP Predictions (2) Prior: ppy, f X, X q N ˆ j j mpxq K ` σ 2, n I kpx, X q mpx q kpx, Xq kpx, X q Posterior predictive distribution pp f X, y, X q at test inputs X obtained by Gaussian conditioning: pp f X, y, X q N `Er f X, y, X s, Vr f X, y, X s Er f X, y, X s m post px q loomoon mpx q`kpx, XqpK ` σn 2 Iq 1 py mpxqq Vr f X, y, X s k post px, X q prior mean loooomoooon kpx, X q kpx, XqpK ` σn 2 Iq 1 kpx, X q prior variance Gaussian Processes Marc Deisenroth February 22,
46 GP Predictions (2) Prior: ppy, f X, X q N ˆ j j mpxq K ` σ 2, n I kpx, X q mpx q kpx, Xq kpx, X q Posterior predictive distribution pp f X, y, X q at test inputs X obtained by Gaussian conditioning: pp f X, y, X q N `Er f X, y, X s, Vr f X, y, X s Er f X, y, X s m post px q loomoon mpx q`kpx, XqpK ` σn 2 Iq 1 py mpxqq Vr f X, y, X s k post px, X q prior mean loooomoooon kpx, X q kpx, XqpK ` σn 2 Iq 1 kpx, X q prior variance From now: Set prior mean function m 0 Gaussian Processes Marc Deisenroth February 22,
47 Illustration: Inference with Gaussian Processes f(x) x Prior belief about the function Predictive (marginal) mean and variance: Er f px q x, s mpx q 0 Vr f px q x, s σ 2 px q kpx, x q Gaussian Processes Marc Deisenroth February 22,
48 Illustration: Inference with Gaussian Processes f(x) x Prior belief about the function Predictive (marginal) mean and variance: Er f px q x, s mpx q 0 Vr f px q x, s σ 2 px q kpx, x q Gaussian Processes Marc Deisenroth February 22,
49 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
50 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
51 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
52 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
53 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
54 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
55 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
56 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
57 Illustration: Inference with Gaussian Processes f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc Deisenroth February 22,
58 Covariance Function A Gaussian process is fully specified by a mean function m and a kernel/covariance function k The covariance function (kernel) is symmetric and positive semi-definite Covariance function encodes high-level structural assumptions about the latent function f (e.g., smoothness, differentiability, periodicity) Gaussian Processes Marc Deisenroth February 22,
59 Gaussian Covariance Function k Gauss px i, x j q σ 2 f exp ` px i x j q J px i x j q{l 2 σ f : Amplitude of the latent function l: Length scale. How far do we have to move in input space before the function value changes significantly Smoothness parameter Assumption on latent function: Smooth (8 differentiable) Gaussian Processes Marc Deisenroth February 22,
60 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,
61 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,
62 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,
63 f(x) Length-Scales Length scales determine how wiggly the function is and how much information we can transfer to other function values 0.3 Data x Gaussian Processes Marc Deisenroth February 22,
64 Matérn Covariance Function k Mat,3{2 px i, x j q σ 2 f 1 `? 3}xi x j } σ f : Amplitude of the latent function l exp? 3}xi x j } l: Length scale. How far do we have to move in input space before the function value changes significantly? l Assumption on latent function: 1-times differentiable Gaussian Processes Marc Deisenroth February 22,
65 Periodic Covariance Function k per px i, x j q σ 2 f exp 2 ` sin2 κpx i x j q 2π κ: Periodicity parameter l 2 k Gauss pupx i q, upx j qq, upxq j cospκxq sinpκxq Gaussian Processes Marc Deisenroth February 22,
66 Meta-Parameters of a GP The GP possesses a set of hyper-parameters: Parameters of the mean function Hyper-parameters of the covariance function (e.g., length-scales and signal variance) Likelihood parameters (e.g., noise variance σn) 2 Gaussian Processes Marc Deisenroth February 22,
67 Meta-Parameters of a GP The GP possesses a set of hyper-parameters: Parameters of the mean function Hyper-parameters of the covariance function (e.g., length-scales and signal variance) Likelihood parameters (e.g., noise variance σn) 2 Train a GP to find a good set of hyper-parameters Gaussian Processes Marc Deisenroth February 22,
68 Meta-Parameters of a GP The GP possesses a set of hyper-parameters: Parameters of the mean function Hyper-parameters of the covariance function (e.g., length-scales and signal variance) Likelihood parameters (e.g., noise variance σ 2 n) Train a GP to find a good set of hyper-parameters Model selection to find good mean and covariance functions (can also be automated Automatic Statistician (Lloyd et al., 2014)) Gaussian Processes Marc Deisenroth February 22,
69 Gaussian Process Training: Hyper-Parameters GP Training f θ Find good GP hyper-parameters θ (kernel and mean function parameters) x i y i N σ n Gaussian Processes Marc Deisenroth February 22,
70 Gaussian Process Training: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Place a prior ppθq on hyper-parameters Posterior over hyper-parameters: ż ppθq ppy X, θq ppθ X, yq, ppy X, θq ppy Xq x i y i f θ N σ n ppy f pxqqpp f X, θqd f Gaussian Processes Marc Deisenroth February 22,
71 Gaussian Process Training: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Place a prior ppθq on hyper-parameters Posterior over hyper-parameters: ż ppθq ppy X, θq ppθ X, yq, ppy X, θq ppy Xq Choose hyper-parameters θ, such that x i y i f θ N σ n ppy f pxqqpp f X, θqd f θ P arg max log ppθq ` log ppy X, θq θ Gaussian Processes Marc Deisenroth February 22,
72 Gaussian Process Training: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Place a prior ppθq on hyper-parameters Posterior over hyper-parameters: ż ppθq ppy X, θq ppθ X, yq, ppy X, θq ppy Xq Choose hyper-parameters θ, such that x i y i f θ N σ n ppy f pxqqpp f X, θqd f θ P arg max log ppθq ` log ppy X, θq θ Maximize marginal likelihood if ppθq U (uniform prior) Gaussian Processes Marc Deisenroth February 22,
73 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters, where the unwieldy f has been integrated out) Also called Maximum Likelihood-Type-II Marginal likelihood: ż ppy X, θq ppy f pxqqpp f X, θqd f ż N `y f pxq, σn 2 I N ` f pxq 0, K d f N `y 0, K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,
74 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters, where the unwieldy f has been integrated out) Also called Maximum Likelihood-Type-II Marginal likelihood: ż ppy X, θq ppy f pxqqpp f X, θqd f ż N `y f pxq, σn 2 I N ` f pxq 0, K d f N `y 0, K ` σn 2 I Learning the GP hyper-parameters: θ P arg max log ppy X, θq θ log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,
75 Training via Marginal Likelihood Maximization Log-marginal likelihood: log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Gaussian Processes Marc Deisenroth February 22,
76 Training via Marginal Likelihood Maximization Log-marginal likelihood: log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Automatic trade-off between data fit and model complexity Gaussian Processes Marc Deisenroth February 22,
77 Training via Marginal Likelihood Maximization Log-marginal likelihood: log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const, K θ : K ` σn 2 I Automatic trade-off between data fit and model complexity Gradient-based optimization of hyper-parameters θ: B log ppy X, θq 1 BK θ Bθ 2yJK 1 K 1 θ i Bθ θ y 1 i 1 2 tr`pαα J K 1 θ qbk θ, Bθ i α : K 1 θ y tr`k 1 2 θ BK θ Bθ i Gaussian Processes Marc Deisenroth February 22,
78 y Example: Training Data x Gaussian Processes Marc Deisenroth February 22,
79 log-length-scales Example: Marginal Likelihood Contour 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
80 y Example: Exploring the Modes (1) x Gaussian Processes Marc Deisenroth February 22,
81 y Example: Exploring the Modes (2) x Gaussian Processes Marc Deisenroth February 22,
82 log-length-scales Marginal Likelihood (1) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
83 log-length-scales Marginal Likelihood (2) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
84 log-length-scales Marginal Likelihood (3) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
85 log-length-scales Marginal Likelihood (4) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
86 log-length-scales Marginal Likelihood (5) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
87 log-length-scales Marginal Likelihood (6) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
88 log-length-scales Marginal Likelihood (7) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
89 log-length-scales Marginal Likelihood (8) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
90 log-length-scales Marginal Likelihood (9) 5 Log-Marginal Likelihood, N= log-noise -4 Gaussian Processes Marc Deisenroth February 22,
91 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex Gaussian Processes Marc Deisenroth February 22,
92 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Gaussian Processes Marc Deisenroth February 22,
93 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Gaussian Processes Marc Deisenroth February 22,
94 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Gaussian Processes Marc Deisenroth February 22,
95 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Gaussian Processes Marc Deisenroth February 22,
96 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Re-start hyper-parameter optimization from random initialization to mitigate the problem Gaussian Processes Marc Deisenroth February 22,
97 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Re-start hyper-parameter optimization from random initialization to mitigate the problem With increasing data set size the GP typically ends up in the good-fit mode. Overfitting (indicator: small length-scales and small noise variance) is very unlikely. Gaussian Processes Marc Deisenroth February 22,
98 Marginal Likelihood and Parameter Learning The marginal likelihood is non-convex In particular in the very-small-data regime, a GP can end up in three different modes when optimizing the hyper-parameters: Overfitting (unlikely, but possible) Underfitting (everything is considered noise) Good fit Re-start hyper-parameter optimization from random initialization to mitigate the problem With increasing data set size the GP typically ends up in the good-fit mode. Overfitting (indicator: small length-scales and small noise variance) is very unlikely. Ideally, we would integrate the hyper-parameters out Why can we do not do this easily? Gaussian Processes Marc Deisenroth February 22,
99 Model Selection Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Gaussian Processes Marc Deisenroth February 22,
100 Model Selection Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Some options: BIC, AIC (see CO-496) Compare marginal likelihood values (assuming a uniform prior on the set of models) Gaussian Processes Marc Deisenroth February 22,
101 f(x) Example x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,
102 f(x) Example 3 Constant kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,
103 f(x) Example 3 Linear kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,
104 f(x) Example 3 Matern kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,
105 f(x) Example 3 Gaussian kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc Deisenroth February 22,
106 Application Areas 5 8 ang.vel. in rad/s angle in rad 2 Reinforcement learning and robotics Model value functions and/or dynamics with GPs Bayesian optimization (Experimental Design) Model unknown utility functions with GPs Geostatistics Spatial modeling (e.g., landscapes, resources) Sensor networks Time-series modeling and forecasting Gaussian Processes Marc Deisenroth February 22,
107 Limitations of Gaussian Processes Computational and memory complexity Training set size: N Training scales in OpN 3 q Prediction (variances) scales in OpN 2 q Memory requirement: OpND ` N 2 q Practical limit N «10, 000 Gaussian Processes Marc Deisenroth February 22,
108 Tips and Tricks for Practitioners To set initial hyper-parameters, use domain knowledge if possible. Standardize input data and set initial length-scales l to «0.5. Standardize targets y and set initial signal variance to σ f «1. Often useful: Set initial noise level relatively high (e.g., σ n «0.5 ˆ σ f amplitude, even if you think your data have low noise. The optimization surface for your other parameters will be easier to move in. When optimizing hyper-parameters, try random restarts or other tricks to avoid local optima are advised. Mitigate the problem of numerical instability (Cholesky decomposition of K ` σn 2 I) by penalizing high signal-to-noise ratios σ f {σ n Gaussian Processes Marc Deisenroth February 22,
109 Appendix Gaussian Processes Marc Deisenroth February 22,
110 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound p(x) x 8 6 x Gaussian Processes Marc Deisenroth February 22, y p(x, y)
111 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound 3 2 Mean 95% confidence bound p(x) x x x 1 Gaussian Processes Marc Deisenroth February 22,
112 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 Data p(x) Mean 95% confidence interval 3 2 Data Mean 95% confidence bound p(x) x x x 1 Gaussian Processes Marc Deisenroth February 22,
113 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Gaussian Processes Marc Deisenroth February 22,
114 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Exploit that affine transformations y Ax ` b of a Gaussian random variable x remain Gaussian Mean: E x rax ` bs AE x rxs ` b Covariance: V x rax ` bs AV x rxsa J Gaussian Processes Marc Deisenroth February 22,
115 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Exploit that affine transformations y Ax ` b of a Gaussian random variable x remain Gaussian Mean: E x rax ` bs AE x rxs ` b Covariance: V x rax ` bs AV x rxsa J 1. Find conditions for A, b to match the mean of y 2. Find conditions for A, b to match the covariance of y Gaussian Processes Marc Deisenroth February 22,
116 Sampling from a Multivariate Gaussian (2) Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. x = randn(d,1); y = chol(σ) *x + µ; Sample x N `0, I Scale x and add offset Here chol(σ) is the Cholesky factor L, such that L J L Σ Gaussian Processes Marc Deisenroth February 22,
117 Sampling from a Multivariate Gaussian (2) Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. x = randn(d,1); y = chol(σ) *x + µ; Sample x N `0, I Scale x and add offset Here chol(σ) is the Cholesky factor L, such that L J L Σ Therefore, the mean and covariance of y are Erys ȳ ErL J x ` µs L J Erxs ` µ µ Covrys Erpy ȳqpy ȳq J s ErL J xx J Ls L J Erxx J sl L J L Σ Gaussian Processes Marc Deisenroth February 22,
118 Conditional 3 2 «ff µx Joint p(x,y) ppx, yq N, µ y «ff Σxx Σ xy Σ yx Σ yy 1 0 y x Gaussian Processes Marc Deisenroth February 22,
119 Conditional Joint p(x,y) Observation ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y x Gaussian Processes Marc Deisenroth February 22,
120 Conditional 3 2 Joint p(x,y) Observation y Conditional p(x y) ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y ppx yq N `µ x y, Σ x y µ x y µ x ` Σ xy Σ 1 yy py µ y q Σ x y Σ xx Σ xy Σ 1 yy Σ yx x Conditional ppx yq is also Gaussian Computationally convenient Gaussian Processes Marc Deisenroth February 22,
121 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx Gaussian Processes Marc Deisenroth February 22,
122 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx The marginal of a joint Gaussian distribution is Gaussian Intuitively: Ignore (integrate out) everything you are not interested in Gaussian Processes Marc Deisenroth February 22, Σ yy
123 The Gaussian Distribution in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 are random variables. Gaussian Processes Marc Deisenroth February 22,
124 The Gaussian Distribution in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 are random variables. Then «ff j µx Σxx Σ ppx, xq N, x x µ x Σ xx where Σ x x P R kˆk and Σ x x P R Dˆk, k Ñ 8. Σ x x Gaussian Processes Marc Deisenroth February 22,
125 The Gaussian Distribution in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 are random variables. Then «ff j µx Σxx Σ ppx, xq N, x x µ x Σ xx Σ x x where Σ x x P R kˆk and Σ x x P R Dˆk, k Ñ 8. However, the marginal remains finite ż pp x q pp x, x qd x N ` µ x, Σ xx where we integrate out an infinite number of random variables x i. Gaussian Processes Marc Deisenroth February 22,
126 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Gaussian Processes Marc Deisenroth February 22,
127 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide) Gaussian Processes Marc Deisenroth February 22,
128 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other fi ffi ffi fl Gaussian Processes Marc Deisenroth February 22,
129 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other ż ppx train, x test q pp x train, x test, x other qd x other fi ffi ffi fl Gaussian Processes Marc Deisenroth February 22,
130 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other ż ppx train, x test q pp x train, x test, x other qd x other ppx test x train q N `µ, Σ µ µ test ` Σ test,train Σ 1 train px train µ train q Σ Σ test Σ test,train Σ 1 train Σ train,test fi ffi ffi fl Gaussian Processes Marc Deisenroth February 22,
131 Gaussian Process Training: Hierarchical Inference Level-1 inference (posterior on f ): ppy X, f q pp f X, θq pp f X, y, θq ppy X, θq ż ppy X, θq ppy f, Xq pp f X, f θqd f f θ x i y i N σ n Gaussian Processes Marc Deisenroth February 22,
132 Gaussian Process Training: Hierarchical Inference Level-1 inference (posterior on f ): ppy X, f q pp f X, θq pp f X, y, θq ppy X, θq ż ppy X, θq ppy f, Xq pp f X, f θqd f f θ Level-2 inference (posterior on θ) ppθ X, yq ppy X, θq ppθq ppy Xq x i y i N σ n Gaussian Processes Marc Deisenroth February 22,
133 GP as the Limit of an Infinite RBF Network Consider the universal function approximator f pxq ÿ 1 Nÿ n pi ` lim γ n exp px NÑ8 N λ 2, ipz n 1 x P R, λ P R` with γ n N `0, 1 (random weights) Gaussian-shaped basis functions (with variance λ 2 {2) everywhere on the real axis Gaussian Processes Marc Deisenroth February 22,
134 GP as the Limit of an Infinite RBF Network Consider the universal function approximator f pxq ÿ 1 Nÿ n pi ` lim γ n exp px NÑ8 N λ 2, ipz n 1 x P R, λ P R` with γ n N `0, 1 (random weights) Gaussian-shaped basis functions (with variance λ 2 {2) everywhere on the real axis f pxq ÿ ipz ż i`1 i ˆ px sq2 γpsq exp λ 2 ż 8 ds γpsq exp 8 ˆ px sq2 λ 2 ds Gaussian Processes Marc Deisenroth February 22,
135 GP as the Limit of an Infinite RBF Network Consider the universal function approximator f pxq ÿ 1 Nÿ n pi ` lim γ n exp px NÑ8 N λ 2, ipz n 1 x P R, λ P R` with γ n N `0, 1 (random weights) Gaussian-shaped basis functions (with variance λ 2 {2) everywhere on the real axis f pxq ÿ ipz ż i`1 i ˆ px sq2 γpsq exp λ 2 ż 8 ds γpsq exp 8 ˆ px sq2 λ 2 ds Mean: Er f pxqs 0 Covariance: Covr f pxq, f px 1 qs θ 2 1 exp px x1 q 2 2λ 2 GP with mean 0 and Gaussian covariance function for suitable θ 2 1 Gaussian Processes Marc Deisenroth February 22,
136 References I [1] N. A. C. Cressie. Statistics for Spatial Data. Wiley-Interscience, [2] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems. In Advances in Neural Information Processing Systems, pages , [3] M. P. Deisenroth and J. W. Ng. Distributed Gaussian Processes. In Proceedings of the International Conference on Machine Learning, [4] M. P. Deisenroth, C. E. Rasmussen, and J. Peters. Gaussian Process Dynamic Programming. Neurocomputing, 72(7 9): , March [5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing with Gaussian Processes. IEEE Transactions on Automatic Control, 57(7): , [6] R. Frigola, F. Lindsten, T. B. Schön, and C. E. Rasmussen. Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, pages Curran Associates, Inc., [7] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard. Gaussian Process Model Based Predictive Control. In Proceedings of the 2004 American Control Conference (ACC 2004), pages , Boston, MA, USA, June July [8] A. Krause, A. Singh, and C. Guestrin. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. Journal of Machine Learning Research, 9: , February [9] J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In AAAI Conference on Artificial Intelligence, pages 1 11, [10] M. A. Osborne, S. J. Roberts, A. Rogers, S. D. Ramchurn, and N. R. Jennings. Towards Real-Time Information Processing of Sensor Network Data Using Computationally Efficient Multi-output Gaussian Processes. In Proceedings of the International Conference on Information Processing in Sensor Networks, pages IEEE Computer Society, [11] J. Quiñonero-Candela and C. E. Rasmussen. A Unifying View of Sparse Approximate Gaussian Process Regression. Journal of Machine Learning Research, 6(2): , [12] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA, [13] S. Roberts, M. A. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. Gaussian Processes for Time Series Modelling. Philosophical Transactions of the Royal Society (Part A), 371(1984), February Gaussian Processes Marc Deisenroth February 22,
Lecture 11: Probability Distributions and Parameter Estimation
Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and
More informationDistributed Gaussian Processes
Distributed Gaussian Processes Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Gaussian Process Summer School, University of Sheffield 15th September
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationAnalytic Long-Term Forecasting with Periodic Gaussian Processes
Nooshin Haji Ghassemi School of Computing Blekinge Institute of Technology Sweden Marc Peter Deisenroth Department of Computing Imperial College London United Kingdom Department of Computer Science TU
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationIdentification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM
Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM Roger Frigola Fredrik Lindsten Thomas B. Schön, Carl E. Rasmussen Dept. of Engineering, University of Cambridge,
More informationHow to build an automatic statistician
How to build an automatic statistician James Robert Lloyd 1, David Duvenaud 1, Roger Grosse 2, Joshua Tenenbaum 2, Zoubin Ghahramani 1 1: Department of Engineering, University of Cambridge, UK 2: Massachusetts
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationIdentification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM
Preprints of the 9th World Congress The International Federation of Automatic Control Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM Roger Frigola Fredrik
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationBuilding an Automatic Statistician
Building an Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ ALT-Discovery Science Conference, October
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationQuantifying mismatch in Bayesian optimization
Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationThe Automatic Statistician
The Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ James Lloyd, David Duvenaud (Cambridge) and
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes 1 Objectives to express prior knowledge/beliefs about model outputs using Gaussian process (GP) to sample functions from the probability measure defined by GP to build
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationNon-Factorised Variational Inference in Dynamical Systems
st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent
More informationState-Space Inference and Learning with Gaussian Processes
Ryan Turner 1 Marc Peter Deisenroth 1, Carl Edward Rasmussen 1,3 1 Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB 1PZ, UK Department of Computer Science & Engineering,
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMODEL BASED LEARNING OF SIGMA POINTS IN UNSCENTED KALMAN FILTERING. Ryan Turner and Carl Edward Rasmussen
MODEL BASED LEARNING OF SIGMA POINTS IN UNSCENTED KALMAN FILTERING Ryan Turner and Carl Edward Rasmussen University of Cambridge Department of Engineering Trumpington Street, Cambridge CB PZ, UK ABSTRACT
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLecture 5 September 19
IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationLecture 9. Time series prediction
Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationBAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS
BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,
More informationSTK-IN4300 Statistical Learning Methods in Data Science
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 2 1/ 38 Outline of the lecture STK-IN4300 - Statistical Learning Methods in Data Science Linear
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh The problem Learn scalar function of vector values f(x).5.5 f(x) y i.5.2.4.6.8 x f 5 5.5 x x 2.5 We have (possibly
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationOptimization of Gaussian Process Hyperparameters using Rprop
Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationPredictive Variance Reduction Search
Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationModelling Transcriptional Regulation with Gaussian Processes
Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More information