Gaussian processes for inference in stochastic differential equations
|
|
- Noreen Reeves
- 5 years ago
- Views:
Transcription
1 Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
2 Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data. anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
3 Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data !0.5!1!1.5! f ( ) GP(m, K) with m(x) = E[f (x)] and K(x, x ) = Cov [f (x), f (x )] anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
4 Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data !0.5!1!1.5! f ( ) GP(m, K) with m(x) = E[f (x)] and K(x, x ) = Cov [f (x), f (x )] Well known examples: 1 Regression: y i = f (x i ) + ν i anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
5 Gaussian Process models in machine learning Gaussian processes provide prior distributions over latent functions f ( ) to be learnt from data !0.5!1!1.5! f ( ) GP(m, K) with m(x) = E[f (x)] and K(x, x ) = Cov [f (x), f (x )] Well known examples: 1 Regression: y i = f (x i ) + ν i 2 Classification (y i {0, 1}): P[y i = 1 f (x i )] = σ (f (x i )) 3... anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
6 Inference We would like to predict f (x) given observations y. = (y 1,..., y n ) at x 1,..., x n. GP prior is infinite dimensional! Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
7 Inference We would like to predict f (x) given observations y. = (y 1,..., y n ) at x 1,..., x n. GP prior is infinite dimensional! For these simple models, inference reduces to n dimensional problem. Let f i = f (xi ) p(f (x) y) = p(f (x) f 1,..., f n )p(f 1,..., f n y)df 1 df n Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
8 A more complicated likelihood Poisson process: Likelihood for set of n points D = (x 1, x 2,..., x n ) T is given by { } n L(D λ) = exp λ(x)dx λ(x i ) T i=1 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
9 A more complicated likelihood Poisson process: Likelihood for set of n points D = (x 1, x 2,..., x n ) T is given by { } n L(D λ) = exp λ(x)dx λ(x i ) T i=1 λ(x) is the unknown intensity or rate of the process. Gaussian Process Modulated Poisson Processes Model (Lloyd et al, 2015) λ(x) = f 2 (x), where f is a Gaussian process. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
10 Outline Stochastic differential equations Drift estimation for dense observations Drift estimation for sparse observations Sparse GP approximation Drift estimation from empirical distribution Outlook Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
11 Stochastic differential equations dv (x) E.g. f (x) = dx dx dt = f (X ) + white noise Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
12 Prior process: Stochastic differential equations (SDE) Mathematicians prefer Ito version for X t R d dx t = f (X t ) dt + D 1/2 (X }{{} t ) }{{} Drift Diffusion Limit of discrete time process X k ɛ t i.i.d. Gaussian. dw t }{{} Wiener process X t+ t X t = f (X t ) t + D 1/2 (X t ) t ɛ t. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
13 X(t) Path with observations t Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
14 Nonparametric drift estimation Infer the drift function f ( ) under smoothness assumptions from observations of the process X. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
15 Nonparametric drift estimation Infer the drift function f ( ) under smoothness assumptions from observations of the process X. Idea (see e.g. Papaspilioupoulis, Pokern, Roberts & Stuart (2012) Assume a Gaussian Process prior f ( ) GP(0, K) with covariance kernel K(x, x ). Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
16 Simple for observations dense in time Euler discretization of SDE X t+ t X t = f (X t ) t + t ɛ t, for t 0. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
17 Simple for observations dense in time Euler discretization of SDE X t+ t X t = f (X t ) t + t ɛ t, for t 0. Likelihood (assume densely observed path X 0:T ) is Gaussian [ ] p(x 0:T f ) exp 1 X t+ t X t 2 2 t t [ exp 1 f (X t ) 2 t + ] f (X t ) (X t+ t X t ). 2 t t Posterior process is also a GP! Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
18 Simple for observations dense in time Euler discretization of SDE X t+ t X t = f (X t ) t + t ɛ t, for t 0. Likelihood (assume densely observed path X 0:T ) is Gaussian [ ] p(x 0:T f ) exp 1 X t+ t X t 2 2 t t [ exp 1 f (X t ) 2 t + ] f (X t ) (X t+ t X t ). 2 t t Posterior process is also a GP! Solves the[ regression problem ] f (x) E Xt+ t X t t X t = x. Works well for t 0. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
19 Example: Double well model n = 6000 data points with t = 0.002, GP with polynomial kernel of order 4. X(t) t f(x) x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
20 Estimation of diffusion Euler discretized SDE Diffusion D(x) = lim t 0 X t+ X t = f (X t ) t + t ɛ t lim t 0 1 t Var (X t+ t X t X t = x) = 1 t E [ (X t+ t X t ) 2 X t = x ]. Independent of drift! Estimate D(x) with GPs by regression with data y t = (X t+ t X t ) 2 D(x) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in xsde November 6, / 34
21 Ice core data δ 18 O t [ky] Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
22 GP inference using RBF kernels U(x) x D(x) x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
23 For larger t... X(t) f(x) t x Problem: We can compute p(x 0:T f ) but NOT P(y f )! anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
24 EM Algorithm Treat unobserved path X t for times t between observations as latent random variables (Batz, Ruttor & Opper NIPS 2013). EM algorithm: 1 E-step: Compute expected complete data likelihood L(f, f old ) = E pold [ln L(X 0:T f )] (1) where p old = posterior p(x 0:T y) computed with the previous estimate f old of the drift. 2 M-Step: Recompute the drift function as f new = arg min f (L(f, f old ) ln P 0 (f )) (2) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
25 Likelihood for a complete path p(x 0:T f ) exp [ 1 2 t ] X t+ t X t 2 [ exp 1 f (X t ) 2 + ] f (X t ) (X t+ t X t ) 2 t t }{{} L(X 0:T f ) t Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
26 The complete likelihood 1 lim t 0 2 E p [ln L(X 0:T f )] = [ E p f (Xt ) 2] t 2E p [(f (X t ), X t+ t X t )] t = 1 2 T [ E p f (Xt ) 2] 2E p [(f (X t ), g t (X t ))] dt 0 = 1 f (x) 2 A(x)dx (f (x), z(x)) dx. (3) 2 where 1 g t (x) = lim t 0 t E p[x t+ t X t X t = x], as well as the functions A(x) = T 0 q t (x)dt b(x) = T 0 g t (x)q t (x)dt. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
27 Problems E-step requires posterior marginal densities q t for diffusion processes with arbitrary prior drift functions f (x). p(x 0:T y, f ) p(x 0:T f ) n δ(y k X kτ ), (4) k=1 GP has to deal with an infinite amount of densely imputed data. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
28 Approximate solutions Linearize drift between consecutive observations (Ornstein Uhlenbeck bridge). Hence, we consider the approximate process for t between two observations dx t = [f (y k ) Γ k (X t y k )]dt + D 1/2 k dw (5) with Γ k = f (y k ) and D k = D(y k ). For this process, the transition density is a multivariate Gaussian! Work with sparse GP approximation. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
29 Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
30 Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
31 Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. Minimize KL divergence D(Q P) = dq(f ) ln dq(f ) dp(f ) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
32 Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. Minimize KL divergence D(Q P) = dq(f ) ln dq(f ) dp(f ) Integrating over dq(f f s ) = dp 0 (f f s ) yields optimal U s (f s ) = E 0 [U(f ) f s ] Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
33 Variational sparse GP approximation L. Csato (2002), M. Titsias (2009) Assume measure over functions f of the form dp(f ) = 1 Z dp 0(f ) e U(f ). Approximate by dq(f ) = 1 Z s dp 0 (f ) e Us(fs). U s depends only on sparse set f s = {f (x)} x S of dim m. Minimize KL divergence D(Q P) = dq(f ) ln dq(f ) dp(f ) Integrating over dq(f f s ) = dp 0 (f f s ) yields optimal U s (f s ) = E 0 [U(f ) f s ] Can be computed analytically for Gaussian prior P 0 and quadratic log likelihood U Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
34 Some details We can show that E 0 [f (x) f s ] = k s (x)(k s ) 1 f s (6) Hence, if U(f ) = 1 2 f 2 (x)a(x)dx f (x)b(x)dx, we get U s (f s ) = E 0 [U(f) f s ] = 1 { 2 f s K 1 s fs K 1 s k s (x) b(x) dx. } k s (x) A(x) k s (x)dx K 1 s f s Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
35 GP estimation after one iteration of EM. f(x) x f(x) x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
36 MSE Direct EM Gibbs t Figure: (color online) Comparison of the MSE for different methods over different time intervals. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
37 Example: A simple pendulum dx = Vdt, dv = γv + mgl sin(x ) ml 2 dt + d 1/2 dw t, Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
38 Example: A simple pendulum dx = Vdt, dv = γv + mgl sin(x ) ml 2 dt + d 1/2 dw t, N = 4000 data points (x, v) with t obs = 0.3 and known diffusion constant d = 1. GP with periodic kernel. f(x, v = 0) x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
39 Drift estimation for SDE using empirical distribution For X R d consider SDE dx t = f (X t )dt + σ(x t )dw t Try to estimate the drift function g( ) given only empirical distribution of (noise free) observations ˆp(x) = 1 n n δ(x x i ) i=1 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
40 Drift estimation for SDE using empirical distribution For X R d consider SDE dx t = f (X t )dt + σ(x t )dw t Try to estimate the drift function g( ) given only empirical distribution of (noise free) observations ˆp(x) = 1 n n δ(x x i ) i=1 Possible, if drift has the form f (x) = g(x) + A(x) ψ(x) where A and g are known functions. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
41 Generalised score functional Define with ε[ψ] = { } 1 2 ψ(x) 2 A + L g ψ(x) p(x)dx L g ψ(x) = g(x) ψ(x) ij D ij (x) 2 ψ(x) x (i) x (j) and g(x) 2 A = g(x) A(x)g(x) and D(x). = σ(x)σ(x). Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
42 Generalised score functional Define with ε[ψ] = { } 1 2 ψ(x) 2 A + L g ψ(x) p(x)dx L g ψ(x) = g(x) ψ(x) ij D ij (x) 2 ψ(x) x (i) x (j) and g(x) 2 A = g(x) A(x)g(x) and D(x). = σ(x)σ(x). δε[ψ] δψ = 0 yields stationary Fokker Planck equation L g p(x) (A(x)ψ(x)p(x)) = 0 where L g is the adjoint of L g L g p(x) = [ f (x)p(x) + 12 ] (D(x)p(x)). A = g = 0: Score matching (Hyvärinen, 2005), (Sriperumbudur et al, 2014) for density estimation. Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
43 Consider pseudo log likelihood n i=1 { } 1 2 ψ(x i) 2 A + L g ψ(x i ) where x i p( ). = X (ti ), i = 1,..., n is sample from the stationary density Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
44 Consider pseudo log likelihood where x i p( ) n i=1 { } 1 2 ψ(x i) 2 A + L g ψ(x i ). = X (ti ), i = 1,..., n is sample from the stationary density Combine with a GP prior over ψ yields estimator for drift of the SDE if this is of the form f (x) = g(x) + A(x) ψ(x) where A and g are given (Batz, Ruttor, Opper 2016). anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
45 Langevin dynamics Consider 2nd order SDE dx t = V t dt, dv t = (F (x) λv)dt + σ(x t, V t )dw t. and set A xx = A xv = 0 and A vv = I, f x = v, g v = λv (known) anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
46 Langevin dynamics Consider 2nd order SDE dx t = V t dt, dv t = (F (x) λv)dt + σ(x t, V t )dw t. and set A xx = A xv = 0 and A vv = I, f x = v, g v = λv (known) The condition f v (x, v) = λv + v ψ(x, v) = λv + F (x) is fulfilled with ψ(x, v) = v F (x) and allows for arbitrary F (x). Use GP prior over F. anfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
47 Example 1: Periodic model: F (x) = a sin x, D v = (σ cos(x)) 2 with n = 2000 observations, time lag τ = 0.25 f(x) x Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
48 Example 2: Non conservative drift model: F (1) (x) = x (1) (1 (x (1) ) 2 (x (2) ) 2 ) x (2) F (2) (x) = x (2) (1 (x (1) ) 2 (x (2) ) 2 ) x (1) with friction λ known, but D unknown, n = 2000 observations Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
49 The case A = D: Likelihood for densely observed path ln L(X 0:T g) = 1 2 with u, v. = u D 1 v. { f (Xt ) 2 D 1 dt 2 f (X t ), dx t } + const Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
50 The case A = D: Likelihood for densely observed path ln L(X 0:T g) = 1 2 { f (Xt ) 2 D 1 dt 2 f (X t ), dx t } + const with u, v. = u D 1 v. Assume f = g + D ψ and apply Ito formula = 1 2 T 0 { ψ D ψ dt + 2g ψ dt 2 ψ dx t } Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
51 The case A = D: Likelihood for densely observed path ln L(X 0:T g) = 1 2 { f (Xt ) 2 D 1 dt 2 f (X t ), dx t } + const with u, v. = u D 1 v. Assume f = g + D ψ and apply Ito formula = 1 2 = 1 2 T 0 T 0 { ψ D ψ dt + 2g ψ dt 2 ψ dx t } { ψ(x t ) 2 D + L g ψ(x t ) } dt ψ(x T ) + ψ(x 0 ) Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
52 Outlook Beyond EM: Full variational Bayesian approximation Estimation of diffusion from sparse data Quality of sparse GP approximation? Langevin dynamics with unobserved velocities? Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
53 Many thanks to my collaborators: Andreas Ruttor, Philipp Batz funded by EU STREP project Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, / 34
Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationA variational radial basis function approximation for diffusion processes
A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationState Space Representation of Gaussian Processes
State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run
More informationHomogenization with stochastic differential equations
Homogenization with stochastic differential equations Scott Hottovy shottovy@math.arizona.edu University of Arizona Program in Applied Mathematics October 12, 2011 Modeling with SDE Use SDE to model system
More informationDeep learning with differential Gaussian process flows
Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationNormalized kernel-weighted random measures
Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,
More informationMathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )
Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca November 26th, 2014 E. Tanré (INRIA - Team Tosca) Mathematical
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationp 1 ( Y p dp) 1/p ( X p dp) 1 1 p
Doob s inequality Let X(t) be a right continuous submartingale with respect to F(t), t 1 P(sup s t X(s) λ) 1 λ {sup s t X(s) λ} X + (t)dp 2 For 1 < p
More informationData assimilation with and without a model
Data assimilation with and without a model Tyrus Berry George Mason University NJIT Feb. 28, 2017 Postdoc supported by NSF This work is in collaboration with: Tim Sauer, GMU Franz Hamilton, Postdoc, NCSU
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationLecture 4: Introduction to stochastic processes and stochastic calculus
Lecture 4: Introduction to stochastic processes and stochastic calculus Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London
More informationThe Smoluchowski-Kramers Approximation: What model describes a Brownian particle?
The Smoluchowski-Kramers Approximation: What model describes a Brownian particle? Scott Hottovy shottovy@math.arizona.edu University of Arizona Applied Mathematics October 7, 2011 Brown observes a particle
More informationLinear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.
Gaussian Processes Gaussian Process Stochastic process: basically, a set of random variables. may be infinite. usually related in some way. Gaussian process: each variable has a Gaussian distribution every
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationPoisson Jumps in Credit Risk Modeling: a Partial Integro-differential Equation Formulation
Poisson Jumps in Credit Risk Modeling: a Partial Integro-differential Equation Formulation Jingyi Zhu Department of Mathematics University of Utah zhu@math.utah.edu Collaborator: Marco Avellaneda (Courant
More informationLecture 1c: Gaussian Processes for Regression
Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationExact Simulation of Diffusions and Jump Diffusions
Exact Simulation of Diffusions and Jump Diffusions A work by: Prof. Gareth O. Roberts Dr. Alexandros Beskos Dr. Omiros Papaspiliopoulos Dr. Bruno Casella 28 th May, 2008 Content 1 Exact Algorithm Construction
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationSystem identification and control with (deep) Gaussian processes. Andreas Damianou
System identification and control with (deep) Gaussian processes Andreas Damianou Department of Computer Science, University of Sheffield, UK MIT, 11 Feb. 2016 Outline Part 1: Introduction Part 2: Gaussian
More informationNon-Factorised Variational Inference in Dynamical Systems
st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent
More informationMarkov Chain Monte Carlo Algorithms for Gaussian Processes
Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationDensity Modeling and Clustering Using Dirichlet Diffusion Trees
p. 1/3 Density Modeling and Clustering Using Dirichlet Diffusion Trees Radford M. Neal Bayesian Statistics 7, 2003, pp. 619-629. Presenter: Ivo D. Shterev p. 2/3 Outline Motivation. Data points generation.
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationMa 221 Final Exam Solutions 5/14/13
Ma 221 Final Exam Solutions 5/14/13 1. Solve (a) (8 pts) Solution: The equation is separable. dy dx exy y 1 y0 0 y 1e y dy e x dx y 1e y dy e x dx ye y e y dy e x dx ye y e y e y e x c The last step comes
More informationA Short Introduction to Diffusion Processes and Ito Calculus
A Short Introduction to Diffusion Processes and Ito Calculus Cédric Archambeau University College, London Center for Computational Statistics and Machine Learning c.archambeau@cs.ucl.ac.uk January 24,
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationStochastic Modelling in Climate Science
Stochastic Modelling in Climate Science David Kelly Mathematics Department UNC Chapel Hill dtbkelly@gmail.com November 16, 2013 David Kelly (UNC) Stochastic Climate November 16, 2013 1 / 36 Why use stochastic
More informationPath integrals for classical Markov processes
Path integrals for classical Markov processes Hugo Touchette National Institute for Theoretical Physics (NITheP) Stellenbosch, South Africa Chris Engelbrecht Summer School on Non-Linear Phenomena in Field
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationBrownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion
Brownian Motion An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2010 Background We have already seen that the limiting behavior of a discrete random walk yields a derivation of
More informationMathematical Economics: Lecture 2
Mathematical Economics: Lecture 2 Yu Ren WISE, Xiamen University September 25, 2012 Outline 1 Number Line The number line, origin (Figure 2.1 Page 11) Number Line Interval (a, b) = {x R 1 : a < x < b}
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationLecture 1: Pragmatic Introduction to Stochastic Differential Equations
Lecture 1: Pragmatic Introduction to Stochastic Differential Equations Simo Särkkä Aalto University, Finland (visiting at Oxford University, UK) November 13, 2013 Simo Särkkä (Aalto) Lecture 1: Pragmatic
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E483 Kernel Methods in Machine Learning Lecture : Gaussian processes Markus Heinonen 3. November, 26 Today s topic Gaussian processes: Kernel as a covariance function C. E. Rasmussen & C. K. I. Williams,
More informationProbabilistic Models for Learning Data Representations. Andreas Damianou
Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part
More informationEfficient Bayesian Inference of Sigmoidal Gaussian Cox Processes
Journal of Machine Learning Research 19 (2018) 1-34 Submitted 12/17; Revised 10/18; Published 11/18 Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes Christian Donner Manfred Opper Artificial
More informationThe Automatic Statistician
The Automatic Statistician Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ James Lloyd, David Duvenaud (Cambridge) and
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationLarge-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page
More informationBayesian inference for stochastic differential mixed effects models - initial steps
Bayesian inference for stochastic differential ixed effects odels - initial steps Gavin Whitaker 2nd May 2012 Supervisors: RJB and AG Outline Mixed Effects Stochastic Differential Equations (SDEs) Bayesian
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationStatistical Mechanics of Active Matter
Statistical Mechanics of Active Matter Umberto Marini Bettolo Marconi University of Camerino, Italy Naples, 24 May,2017 Umberto Marini Bettolo Marconi (2017) Statistical Mechanics of Active Matter 2017
More informationSimulation of conditional diffusions via forward-reverse stochastic representations
Weierstrass Institute for Applied Analysis and Stochastics Simulation of conditional diffusions via forward-reverse stochastic representations Christian Bayer and John Schoenmakers Numerical methods for
More informationVariable sigma Gaussian processes: An expectation propagation perspective
Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu
More informationKolmogorov Equations and Markov Processes
Kolmogorov Equations and Markov Processes May 3, 013 1 Transition measures and functions Consider a stochastic process {X(t)} t 0 whose state space is a product of intervals contained in R n. We define
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More information