Variational inference in the conjugate-exponential family

Size: px
Start display at page:

Download "Variational inference in the conjugate-exponential family"

Transcription

1 Variational inference in the conjugate-exponential family Matthew J. Beal Work with Zoubin Ghahramani Gatsby Computational Neuroscience Unit August 2000

2 Abstract Variational inference in the conjugate-exponential family When learning models from data there is always the problem of over-fitting/generalisation performance. In a Bayesian approach this can be resolved by averaging over all possible settings of the model parameters. However for most interesting problems, averaging over models leads to intractable integrals and so approximations are necessary. Variational methods are becoming a widespread tool to approximate inference and learning in many graphical models: they are deterministic, generally fast, and can monotonically increase an objective function which transparently incorporates model complexity cost. I ll provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models, and show how well-known algorithms (e.g. belief-propagation) can be readily incorporated in the variational updates. Some examples to illustrate the ideas: determining the most probable number of clusters and their intrinsic latent-space dimensionality in a Mixture of Factor Analysers model; and recovering the hidden state-space dimensionality in a Linear Dynamical System model. This is work with Zoubin Ghahramani.

3 Outline ffl Briefly review variational Bayesian learning. ffl Concentrate on conjugate-exponential models. ffl Variational Expectation-Maximisation (VEM). ffl Examples in Mixture of Factor Analysers (MFA) and Linear Dynamical Systems (LDS).

4 Bayesian approach Motivation ffl Maximum likelihood (ML) does not penalise complex models, which can a priori model a larger range of data sets. ffl Bayes does not suffer from overfitting if we integrated out all the parameters. ffl We should be able to do principled model comparison. Method ffl Express distributions over all parameters in the model. ffl Calculate the Evidence P (Y jm) = Z d P (Y j ;M)P ( jm): too simple P(Data) Data Sets Y "just right" too complex (adapted from D.J.C. MacKay) Problem ffl These integrals are computationally intractable.

5 Practical approaches ffl Laplace approximations: Appeal to Central Limit Theorem, making a Gaussian approximation about the maximum a posteriori parameter estimate. ffl Large sample approximations: e.g. BIC. ffl Markov chain Monte Carlo (MCMC): Guaranteed to converge in the limit. Many samples required for accurate results. Hard to assess convergence. ffl Variational approximations.

6 The variational method Let the hidden states be X and the parameters, ln P (Y ) = ln Z dx d P (Y; X; ) Z dx d Q(X; )ln P (Y; X; ) Q(X; ) : Approximate the intractable distribution over hidden states (X) and parameters ( ) with a simpler distribution. ln P (Y ) = ln Z dx d P (Y; X; ) Z dx d Q X (X)Q ( )ln = F(Q; Y ): P (Y; X; ) Q X (X)Q ( ) Maximisation of this lower bound leads to EM-like updates: E step Q Λ X (X) / exp hln P (X;Y j )i Q ( ) M step Q Λ ( ) / P ( )exphln P (X;Y j )i Q X (X) Equivalent to minimizing the KL-divergence between the approximating and true posteriors.

7 a bit more explanation F(Q; Y ) = = Z Z dx d Q X (X)Q ( )ln d Q ( ) Z " ln P ( ) Q ( ) + dx Q X (X)ln P (X; Y; ) Q X (X)Q ( ) P (X; Y j ) Q X (X) # : e.g. setting a derivative to Y ( ) / ln P ( ) 1 ln Q ( ) + hln P (X; Y j )i QX (X) E step Q Λ X (X) / exp hln P (X;Y j )i Q ( ) M step Q Λ ( ) / P ( )exphln P (X;Y j )i Q X (X) Question: What distributions can we make use of in our models?

8 Conjugate-Exponential models Consider variational Bayesian learning in models that satisfy: Condition (1). The complete data likelihood is in the exponential family: P (x; yj ) = f(x; y) g( )exp n o ffi( ) > u(x; y) where ffi( ) is the vector of natural parameters, and u and f and g are the functions that define the exponential family. Condition (2). The parameter prior is conjugate to the complete data likelihood: P ( j ; ν) = h( ; ν) g( ) exp n o ffi( ) > ν where and ν are hyperparameters of the prior. We call models that satify conditions (1) and (2) conjugate-exponential.

9 Exponential family models Some models that are in the exponential family: ffl Gaussian mixtures, ffl factor analysis, probabilistic PCA, ffl hidden Markov models and factorial HMMs, ffl linear dynamical systems and switching state-space models, ffl Boltzmann machines, ffl discrete-variable belief networks. Other as yet undreamt-of models can combine Gaussian, Gamma, Poisson, Dirichlet, Wishart, Multinomial and others. Some which are not in exponential family: ffl sigmoid belief networks, ffl independent components analysis Make approximations/changes to move into exponential family...

10 Theoretical Results Theorem 1 Given an iid data set Y = (y 1 ;:::y n ), if the model satisfies conditions (1) and (2), then at the maxima of F(Q; Y ) (minima of KL(QkP )): (a) Q ( ) is conjugate and of the form: Q ( ) = h(~ ; ~ν)g( ) ~ exp n o ffi( ) > ~ν where ~ = + n, ~ν = ν + P n i=1 u(x i ; y i ), and u(x i ; y i ) = hu(x i ; y i )i Q, using h i Q to denote expectation under Q. (b) Q X (X) = Q n i=1 Q xi (x i ) and Q xi (x i ) is of the same form as the known parameter posterior: Q xi (x i ) / f(x i ; y i )exp = P (x i jy i ; ffi( )) where ffi( ) = hffi( )i Q. n ffi( ) > u(x i ; y i ) o KEY points: (a) the approximate parameter posterior is of the same form as the prior; (b) the approximate hidden variable posterior is of the same form as the hidden variable posterior for known parameters.

11 Variational EM algorithm Since Q ( ) and Q xi (x i ) are coupled, (a) and (b) do not provide an analytic solution to the minimisation problem. VE Step: Compute the expected sufficient statistics t(y ) = P i u(x i ; y i ) under the hidden variable distributions Q xi (x i ). VM Step: Compute the expected natural parameters ffi( ) under the parameter distribution given by ~ and ~ν. Properties: ffl VE step has same complexity as corresponding E step. ffl Reduces to the EM algorithm if Q ( ) = ffi( Λ ).M step then involves re-estimation of Λ. ffl F increases monotonically, incorporates model complexity penalty.

12 Belief networks Corollary 1: Conjugate-Exponential Belief Networks. Let M be a conjugate-exponential model with hidden and visible variables z = (x; y) that satisfy a belief network factorisation. That is, each variable z j has parents z pj and P (zj ) = Q j P (z jjz pj ; ). Then the approximating joint distribution for M satisfies the same belief network factorisation: Q z (z) = Y j Q(z j jz pj ; ~ ) where the conditional distributions have exactly the same form as those in the original model but with natural parameters ffi(~ ) = ffi( ). Furthermore, with the modified parameters ~, the expectations under the approximating posterior Q x (x) / Q z (z) required for the VE Step can be obtained by applying the belief propagation algorithm if the network is singly connected and the junction tree algorithm if the network is multiply-connected.

13 Markov networks Theorem 2: Markov Networks. Let M be a model with hidden and visible variables z = (x; y) that satisfy a Markov network factorisation. That is, the joint density can be written as a product of clique-potentials ψ j, P (zj ) = g( ) Q j ψ j(c j ; ), where each clique C j is a subset of the variables in z. Then the approximating joint distribution for M satisfies the same Markov network factorisation: Q z (z) = ~g Y j ψ j (C j ) where ψ j (C j ) = exp n hln ψj (C j ; )i Q o are new clique potentials obtained by averaging over Q ( ), and ~g is a normalisation constant. Furthermore, the expectations under the approximating posterior Q x (x) required for the VE Step can be obtained by applying the junction tree algorithm. Corollary 2: Conjugate-Exponential Markov Networks. Let M be a conjugate-exponential Markov network over the variables in z. Then the approximating joint distribution for M is given by Q z (z) = ~g Q j ψ j(c j ; ~ ), where the clique potentials have exactly the same form as those in the original model but with natural parameters ffi(~ ) = ffi( ).

14 Mixture of factor analysers p-dim. vector y generated from k unobserved independent Gaussian sources, x, then corrupted with Gaussian noise, r, with diagonal covariance matrix Ψ: y = Λx + r. Marginal density of y is Gaussian with zero mean and covariance ΛΛ T + Ψ. a,b ν 1 ν 2... ν S Λ 1 Λ 2... Λ S α s n π Ψ y n n=1...n x n ffl Complete data likelihood is in exponential family. ffl Choose conjugate priors: (ν ο Gamma, Λ οgaussian, ß οdirichlet). VE: posteriors over hidden states calculated as usual. VM: posteriors over parameters have same form as priors.

15 Synthetic example True data: 6 Gaussian clusters with dimensions: ( ) embedded in 10 dimensions. Inferred structure: ffl Finds the clusters and their dimensionalities. ffl Model complexity reduces in line with lack of data support. number of points intrinsic dimensionalities per cluster

16 Linear Dynamical Systems X 1 X 2 X 3 X T Y 1 Y 2 Y 3 Y T ffl Sequence of D-dimensional real-valued observed vectors y 1:T. ffl Assume at each time step t, y t was generated from a k-dimensional real-valued hidden state variable x t, and that the sequence of x 1:T define a first order Markov process. ffl The joint probability is given by P (x 1:T ; y 1:T ) = P (x 1 )P (y 1 jx 1 ) TY t=2 P (x t jx t 1 )P (y t jx t ): ffl If transition and output functions are linear, time-invariant, and noise distributions are Gaussian, this is a Linear-Gaussian state-space model: x t = Ax t 1 + w t ; y t = Cx t + r t :

17 Model selection using ARD EM underlying Dynamics matrix, A Output matrix, C ffl Gaussian priors on the entries in the columns of these matrices. P (a i jff) = N (0; diag(ff) 1 ); P (c i jfi) = N (0; diag(ρ i fi) 1 ) ffl ff and fi are vectors of hyperparameters, to be optimised during learning. Some ff,fi! 1. ffl Empty columns signify extinct hidden dimensions for: modelling hidden dynamics, for modelling output covariance structure.

18 LDS variational approximation Parameters of the model = (A; C; ρ), hidden states x 1:T. ln P (Y ) = ln Z da dc dρ dx 1:t P (A; C; ρ; x 1:T ; y 1:T ): Lower bound: Jensen s inequality ln P (Y ) F(Q; Y ) = Z da dc dρ dx 1:T Q(A; C; ρ; x 1:T ) ln P (A; C; ρ; x 1:T ; y 1:T ) : Q(A; C; ρ; x 1:T ) Factored approximation to the true posterior Q(A; C; ρ; x 1:T ) = Q A (A)Q Cρ (C; ρ)q x (x 1:T ): This factorisation falls out from the initial assumptions and the conditional independencies in the model. Priors: ffl Sensor precisions, ρ, given conjugate gamma priors. ffl Transition and output matrices given conjugate zero-mean Gaussian priors ARD.

19 Optimal Q( ) distributions: LDS Complete data likelihood is of conjugate-exponential form: iterative analytic maximisations. Dynamics matrix Q A (A): A ο N (a i ; a i ; ± A i ) a i = S > ± A i ; ± A i = (diag(ff)+w ) 1 Noise covariance Q ρ (ρ): ρ i ο Gamma(ρ i ;~a;~b) ~a = a + T=2; ~b = b + g i =2 Output matrix Q C (Cjρ i ): C ο N (c i ; c i ; ± C i ) c i = ρ i U i ± C i ; ± C i = (diag(fi)+w 0 ) 1 =ρ i Hidden state Q x (x 1:T ): x t ο N (x t ; (T ) t ; Ψ (T ) t ) f (T ) t ; Ψ (T ) t g = KalmanSmoother(Q ACρ ; y 1:T ) Hyperparameters ff; fi: 1=ff = ha ai QA 1=fi = hc ci QCρ S = g i = TX TX t=2 X T 1 hx t 1 x > i; W = t hx t x > i; t t=1 y 2 ti U i(diag(fi) + W 0 )U > i ; U i = t=1 t=1 W 0 = W + hx t x > t i TX y ti hx > t i;

20 Method overview 1. Randomly initialise the approximate posteriors; 2. Update each posterior according to variational E- & M- steps ffl Q A (A) Q Cρ (Cjρ) Qρ(ρ) Q x (x 1:T ); 3. Update the hyperparameters: ff and fi; 4. Repeat steps 2-3 until convergence.

21 Variational E Step for LDS X 1 X 2 X 3 X T Y 1 Y 2 Y 3 Y T ffl Conjugate-exponential singly connected belief network, ffl (C1) use Belief Propagation, which for LDSs is the Kalman smoother: Forward: infer state x t given past observations, Backward: infer state x t given future observations. ffl Inference using an ensemble of parameters is possible, using just a single setting of the parameters. Required averages: hρ i c i i; hρ i c i c i> i; hai; ha > Ai ffl The result is a recursive algorithm very similar to the Kalman smoothing propagation.

22 Synthetic experiments Generated a variety of state-space models and looked the recovered hidden structure. Observed data is ffl 10-dim output ffl 200 time-steps ffl True model: 3-dim static state-space. Inferred model: A C X X Y ffl True model: 3-dim dynamical state-space. Inferred model: X X Y ffl True model: 4-dim state-space, of which 3 dynamical. Inferred model: X X Y

23 Degradation with data support True model: ffl 6-dim state-space ffl 10-dim output Complexity of recovered structure decreases with decreasing data support (400 to 10 time-steps). Inferred models: X X Y X X Y X X Y X X Y X X Y X X Y X X Y

24 Summary ffl Bayesian learning avoids overfitting and can be used to learn model structure. ffl Tractable learning using variational approximations. ffl Conjugate-exponential familes of models. ffl Variational EM and integration of existing algorithms. ffl Examples learning structure in MFA and LDS models. Issues & Extensions ffl Quality of the variational approximations? ffl How best to bring a model into exponential family? ffl Extensions to other models, e.g. switching state-space models [Ueda], is straightforward. ffl Problems integrating existing algorithms into the VE step. ffl Implementation on a general graphical model.

25 Optimal Q( ) distributions: MFA Q(x n js n ) sn (x n;s ; ± s ) : ± s 1 = hλ st Ψ 1 Λ s i Q(Λs ) + I Q(Λ s q ) sn (Λs q; ± q;s ) : Λ s q = " ± q;s 1 = Ψ 1 qq Ψ 1 N X n=1 NX n=1 x n;s = ± s Λ st Ψ 1 y n Q(s n )y n x n;st ± q;s # Q(s n )hx n x nt i Q(xn js n ) + diaghν s i Q(ν s ) q Q(ν s l ) sgamma(as l ;b s l ) : as l = a + p 2 b s l = b px q=1 hλ s ql2 iq(λs ) Q(ß) sdirichlet(!u) :!u s = ff S + N X n=1 Q(s n ) ln Q(s n ) = [ψ(!u s ) ψ(!)] ln j±s j + hln P (y n jx n ;s n ; Λ s ; Ψ)i Q(xn js n )Q(Λ s ) + c Note that the optimal distributions Q(Λ s ) have block diagonal covariance structure. So even though each Λ s is a p q matrix, its covariance only has O(pq 2 ) parameters. Differentiating F with respect to the parameters, a and b, of the precision prior we get fixed point equations ψ(a) = hln νi + ln b and b = a=hνi. Similarly the fixed point for the hyperparameters for the Dirichlet prior is ψ(ff) ψ(ff=s) + P [ψ(!u s ) ψ(!)] =S = 0.

26 Sampling from Variational Approximations Sampling m s Q( ) gives us estimates of: ffl The Exact Predictive Density: P (yjy ) = = Z Z ß M X m=1 d P (yj )P ( jy ) d Q( )P (yj ) P ( jy ) Q( ) P (yj m)!m weights:!m = Ω 1 P ( m ;Y ) Q( m ) ; withω P s:t:! m = 1 ffl The True Evidence: P (Y jm) = ffl The KL Divergence: Z d Q( ) P ( ; Y ) Q( ) = hω!i KL(Q( )kp ( jy )) = lnh!i hln!i:

27 Variational Bayes & Ensemble Learning ffl multilayer perceptrons (Hinton & van Camp, 1993) ffl mixture of experts (Waterhouse, MacKay & Robinson, 1996) ffl hidden Markov models (MacKay, 1995) ffl other work by Jaakkola, Barber, Bishop, Tipping, etc Examples of VB Learning Model Structure Model learning has been treated with variational Bayesian techniques for: ffl mixtures of factor analysers (Ghahramani & Beal, 1999) ffl mixtures of Gaussians (Attias, 1999) ffl independent components analysis (Attias, 1999; Miskin & MacKay, 2000; Valpola 2000) ffl principal components analysis (Bishop, 1999) ffl linear dynamical systems (Ghahramani & Beal, 2000) ffl mixture of experts (Ueda & Ghahramani, 2000) ffl hidden Markov models (Ueda & Ghahramani, in prep) (compiled by Zoubin)

28 0.3 Evolution of F, true evidence and KL-divergence x Train and Test fractional error train error test error lnp(y) train F(π,Λ) train F(θ) F Iteration

29 Required expectations for linear-gaussian state-space models a + T=2 defining G 0 = diag b + G=2 hc > R 1 Ci = diagfi + W 0 1 hc > R 1 i = diagfi + W 0 1 UG 0 hai = S > (diagff + W ) 1 ha > Ai = k (diagff + W ) 1 + hai > hai hp + UG 0 U > diagfi + W 0 1 i P Ω ff P T > T 1 Ω ff P > T where S = t=2 xt 1 x t, W = t=1 xt x t, U i = t=1 y tihx > t i and W 0 = W + Ω x T x T > ff. Kalman smoothing propagation Forward recursion: ± Λ t = ± 1 t Backward recursion: + ha > Ai 1 μ t = ± t hc> R 1 iy t + hai± Λ t 1 ± 1 t 1 μ t 1 ± t = I + hc > R 1 Ci hai± Λ t 1 hai>λ 1 t = Ψ t h± 1 t μ t + hai > Ψ 1 t+1 + hai> 1 hai±λ t Ψ 1 Ψ t = h ± Λ 1 t hai > Ψ 1 t+1 t+1 hai±λ t ± 1 t Λ μ t Λ t+1 + hai±λ t hai> 1 hai i 1 cov(x t ; x t+1 ) = Ψ 1 t+1 + hai±λ t hai> hai > ± Λ t haiλ 1

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

The Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures

The Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures BAYESIAN STATISTICS 7, pp. 000 000 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 The Variational Bayesian EM

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

The Variational Kalman Smoother

The Variational Kalman Smoother The Variational Kalman Smoother Matthew J. Beal and Zoubin Ghahramani Gatsby Computational Neuroscience Unit {m.beal,zoubin}@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk May 22, 2000. last revision April

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Variational Learning : From exponential families to multilinear systems

Variational Learning : From exponential families to multilinear systems Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Towards a Bayesian model for Cyber Security

Towards a Bayesian model for Cyber Security Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Variational Bayesian Linear Dynamical Systems

Variational Bayesian Linear Dynamical Systems Chapter 5 Variational Bayesian Linear Dynamical Systems 5.1 Introduction This chapter is concerned with the variational Bayesian treatment of Linear Dynamical Systems (LDSs), also known as linear-gaussian

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Variational Message Passing

Variational Message Passing Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

2 XAVIER GIANNAKOPOULOS AND HARRI VALPOLA In many situations the measurements form a sequence in time and then it is reasonable to assume that the fac

2 XAVIER GIANNAKOPOULOS AND HARRI VALPOLA In many situations the measurements form a sequence in time and then it is reasonable to assume that the fac NONLINEAR DYNAMICAL FACTOR ANALYSIS XAVIER GIANNAKOPOULOS IDSIA, Galleria 2, CH-6928 Manno, Switzerland z AND HARRI VALPOLA Neural Networks Research Centre, Helsinki University of Technology, P.O.Box 5400,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Variational Bayesian Hidden Markov Models

Variational Bayesian Hidden Markov Models Chapter 3 Variational Bayesian Hidden Markov Models 3.1 Introduction Hidden Markov models (HMMs) are widely used in a variety of fields for modelling time series data, with applications including speech

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Lecture 1a: Basic Concepts and Recaps

Lecture 1a: Basic Concepts and Recaps Lecture 1a: Basic Concepts and Recaps Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Variational Message Passing

Variational Message Passing Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

VIBES: A Variational Inference Engine for Bayesian Networks

VIBES: A Variational Inference Engine for Bayesian Networks VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

An Unsupervised Ensemble Learning Method for Nonlinear Dynamic State-Space Models

An Unsupervised Ensemble Learning Method for Nonlinear Dynamic State-Space Models An Unsupervised Ensemble Learning Method for Nonlinear Dynamic State-Space Models Harri Valpola and Juha Karhunen Helsinki University of Technology, Neural Networks Research Centre P.O.Box 5400, FIN-02015

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Graphical models: parameter learning

Graphical models: parameter learning Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Type II variational methods in Bayesian estimation

Type II variational methods in Bayesian estimation Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES Simo Särä Aalto University, 02150 Espoo, Finland Jouni Hartiainen

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information