Variational inference
|
|
- Philippa Blankenship
- 5 years ago
- Views:
Transcription
1 Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France.
2 Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM algorithm Approximate inference Variational approximation Mean Field approximation Optimal variational distribution under the MF approximation Relation with Gibbs sampling Audio source separation example Mixing model Source model Source separation problem 2/26 November 18, 2016
3 Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM algorithm Approximate inference Variational approximation Mean Field approximation Optimal variational distribution under the MF approximation Relation with Gibbs sampling Audio source separation example Mixing model Source model Source separation problem 3/26 November 18, 2016
4 Introduction Audio source separation example Further readings Probabilistic model Let consider a probabilistic model where I x denotes the set of observed random variables; I z denotes the set of latent/hidden random variables; I θ denotes the set of deterministic parameters. For example: Audio source separation ab ari te La nt ev urc les : source and mixing parameters. For example: NMF parameters and mixing filters. so Observed mixture variables Note: In a Bayesian framework the latent variables include the parameters. 4/26 November 18, 2016 Appendix
5 Problem 1. Definition of the model: How are the data generated from the latent unobserved variables? 2. Inference: What are the values of the latent variables that generated the data? Inference We are naturally interested in computing the posterior p(z x; θ ). Maximum Likelihood estimation: θ = arg max p(x; θ); θ Minimum Mean Square Error (MMSE) estimation: ẑ = E z x;θ [z]. If we can compute the posterior distribution Expectation-Maximization (EM) algorithm. 5/26 November 18, 2016
6 Log-likelihood decomposition Let q be a probability density function (pdf) over z, then the log-likelihood can be decomposed as: ln p(x; θ) = L(q; θ) + KL(q p(z x; θ)), (1) }{{}}{{} Variational Free Energy Kullback-Leibler divergence ( ) ( ) where L(q, θ) = KL(q p(z x; θ)) = ln and f (z) q = f (z)q(z)dz. ln p(x, z; θ) q }{{} E(q;θ) ( p(z x; θ) q(z) ln q(z) q }{{} H(q): entropy ) As KL( ) 0, L(q, θ) lower bounds the log-likelihood. ; (2) ; (3) q Note: Variational free energy = Evidence lower bound (ELBO) in Bayesian settings. 6/26 November 18, 2016
7 EM algorithm Maximize L(q, θ) w.r.t q at the E-step and w.r.t θ at the M-step. E-step: From (1), Then from (2), q (z) = arg max L(q; θ old ) = p(z x; θ old ). q Complete-data log-likelihood {}}{ L(q ; θ) = ln p(x, z; θ) p(z x;θ old ) }{{} Q(θ,θ old ) in the standard EM formulation M-step: ( ) + H p(z x; θ old ). } {{ } constant w.r.t θ θ new = arg max Q(θ, θ old ). θ 7/26 November 18, 2016
8 What if we cannot compute the posterior distribution? 8/26 November 18, 2016
9 Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM algorithm Approximate inference Variational approximation Mean Field approximation Optimal variational distribution under the MF approximation Relation with Gibbs sampling Audio source separation example Mixing model Source model Source separation problem 9/26 November 18, 2016
10 Approximate inference Stochastic methods: Based on sampling Markov Chain Monte Carlo (MCMC) methods, etc.; Computationally expensive but converges to the true posterior. Deterministic methods: Based on optimization Variational methods, etc.; Computationally cheaper but not exact. True posterior xxx xxxxx xxx Monte Carlo Variational 10/26 November 18, 2016
11 Variational approximation We want to find q F (in a variational family) which approximates p(z x; θ). We take the KL divergence as a measure of fit, but we cannot directly minimize it. However from (1) we have: Variational EM algorithm: KL(q p(z x; θ)) = ln p(x; θ) L(q; θ). E-step: q = arg min q F M-step: θnew = arg max L(q ; θ). θ KL(q p(z x; θ old )) = arg max L(q; θ old ); q F 11/26 November 18, 2016
12 Mean Field (MF) approximation : set of pdfs over that factorize as true posterior mean field approximation We drop the posterior dependencies between the latent variables; Generally the true posterior does not belong to this variational family; More general than it seems: Latent variables can be grouped and only the distribution of each group factorizes. We want to optimize L(q; θ old ) for this factorized distribution. 12/26 November 18, 2016
13 Optimization under the MF approximation We can show that 1 : L(q; θ) = KL(q j p(x, z j ; θ)) + i j H(q i ), (4) where ln p(x, z j ; θ) = ln p(x, z; θ) i j q i. Coordinate ascent inference; optimizing w.r.t q j with {q i } i j fixed: qj (z j ) = arg max L(q; θ old ) = arg min KL(q j p(x, z j ; θ old )). q j q j ln q j (z j ) = ln p(x, z; θ old ) i j q i + constant. Hope to recognize a standard distribution, or normalize. Coupled solutions so initialize then cyclically update. 1 See appendix for calculus details. 13/26 November 18, 2016
14 Relation with Gibbs sampling Let consider a Bayesian setting without deterministic parameters θ. Variational Bayesian inference q j (z j ) exp [ ] ln p(x, z) i j q. i But p(x, z) = p(z j x, z \zj )p(x, z \zj ) where z \zj denotes z except z j, so [ ] qj (z j ) exp ln p(z j x, z \zj ) i j q i. Gibbs sampling We want to sample from p(z x) by successively sampling z j from p(z j x, z \zj ). Hybrid approach alternating sampling and optimization. 14/26 November 18, 2016
15 Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM algorithm Approximate inference Variational approximation Mean Field approximation Optimal variational distribution under the MF approximation Relation with Gibbs sampling Audio source separation example Mixing model Source model Source separation problem 15/26 November 18, 2016
16 Introduction Audio source separation example Further readings Appendix Audio source separation example Mixing model: I ψfn (t) is a Modified Discrete Cosine Transform (MDCT) atom. I Sensor noise: bi (t) N (0, σi2 ). 16/26 November 18, 2016
17 Introduction Audio source separation example Further readings Appendix Source model Non-negative Matrix Factorization (NMF) of the short-term power spectral density: amplitude spectral templates temporal activations frequency (khz) amplitude (db) 17/26 November 18, 2016
18 Source separation problem Observed variables: x = {x i (t)} i,t Latent variables: s = {s j,fn } j,f,n Parameters: θ = { {W j, H j } j, {a ij (t)} i,j,t, {σi 2} } i Minimum Mean Square Error estimation of the sources ŝ = E s x;θ [s] Maximum likelihood parameters estimation θ = arg max p(x; θ) θ p(s x; θ) is Gaussian but parametrized by a full covariance matrix of too high dimensions to be implemented VEM algorithm. 18/26 November 18, 2016
19 Mean field approximation Source estimate q(s) = F 1 ŷ ij (t) = [a ij ŝ j ](t) = J F 1 ŝ j (t) = N 1 f =0 n=0 F 1 N 1 j=1 f =0 n=0 m j,fn = s j,fn q ; N 1 f =0 n=0 m j,fn g ij,fn (t), q jfn (s j,fn ). m j,fn ψ fn (t); g ij,fn (t) = [a ij ψ fn ](t). 19/26 November 18, 2016
20 Complete-data log-likelihood ln p(x, s; θ) = ln p(x s; θ) + ln p(s; θ) [ c = 1 I T 1 ln(σi 2 ) σ 2 i=1 t=0 i [ 1 J F 1 N 1 2 We recall that y ij (t) = F 1 N 1 f =0 n=0 v j,fn = [W j H j ] fn. j=1 f =0 n=0 s j,fn g ij,fn (t); ( x i (t) ln(v j,fn ) + s2 j,fn v j,fn ]. J j=1 ) ] 2 y ij (t) 20/26 November 18, 2016
21 E-Step Under the MF approximation qjfn (s j,fn) = arg max L(q; θ old ) satisfies: q jfn ln q jfn (s j,fn ) = ln p(x, s; θ old ) ( We develop this expression: q j f n (j,f,n ) (j,f,n) omitting all the terms that do not depend on s j,fn ; ). and we hope to recognize a standard distribution. 21/26 November 18, 2016
22 E-Step After computation we find that q jfn (s j,fn) = N(s j,fn ; m j,fn, γ j,fn ) where: γ j,fn = ( 1 v j,fn + d j,fn = m j,fn v j,fn I 1 σ 2 i=1 i I 1 σ 2 i=1 i m j,fn = m j,fn γ j,fn d j,fn. T 1 t=0 T 1 t=0 g 2 ij,fn (t) ) 1 ; ( g ij,fn (t) x i (t) J j =1 Note that the parameters m j,fn have to be updated in turn. ) ŷ ij (t) ; Note: We can show that d j,fn = ( L(q ;θ)) m j,fn. For the sake of computational efficiency, we can thus use a preconditioned conjugate gradient method instead of this coordinate-wise update. 22/26 November 18, 2016
23 M-Step (NMF parameters example) We now want to maximize L(q ; θ) w.r.t the NMF parameters under a non-negativity constraint. L(q ; θ) c = c = 1 2 c = 1 2 ln p(x, s; θ) q J F 1 N 1 j=1 f =0 n=0 J F 1 N 1 j=1 f =0 n=0 [ ln([w j H j ] fn ) + m2 j,fn + γ ] j,fn [W j H j ] fn d IS (m 2 j,fn + γ j,fn, [W j H j ] fn ). The Itakura-Saito (IS) divergence is given by d IS (x, y) = x y ln x y 1. Compute an NMF on ˆP j = [ mj,fn 2 + γ j,fn divergence. ]fn RF N + with the IS It can be done with the standard multiplicative update rules. 23/26 November 18, 2016
24 Further readings D. M. Blei et al., Variational Inference: A Review for Statisticians, arxiv: v4 [stat.co], D. G. Tzikas et al., The variational approximation for Bayesian inference. IEEE Signal Processing Magazine, 25(6), , /26 November 18, 2016
25 Calculus details for E-step under the mean-field approximation From [Tzikas et al., 2008]: L(q; θ) = = = = = ( p(x, z; θ) ) q i ln i i q dz i [ q i ln p(x, z; θ) ln q i ]dz i i q i ln p(x, z; θ) dz i q k ln q i dz k i i i k k q i ln p(x, z; θ) dz i q i ln q i dz i as q k dz k = 1 i i i [ q j ln p(x, z; θ) q i dz i ]dz j q j ln q j dz j q i ln q i dz i. i j i j 25/26 November 18, 2016
26 Let define ln p(x, z j ; θ) = ln p(x, z; θ) q i dz i = ln p(x, z; θ) i j q. i i j It follows L(q; θ) = q j ln p(x, z j ; θ)dz j q j ln q j dz j i j = q j ln p(x, z j; θ) dz j q i ln q i dz i q j i j = KL(q j p) q i ln q i dz i. i j q i ln q i dz i 26/26 November 18, 2016
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationEM & Variational Bayes
EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationMaximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning
Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning Onur Dikmen and Cédric Févotte CNRS LTCI; Télécom ParisTech dikmen@telecom-paristech.fr perso.telecom-paristech.fr/ dikmen 26.5.2011
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More information14 : Mean Field Assumption
10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationMinimizing D(Q,P) def = Q(h)
Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationAlgorithms for Variational Learning of Mixture of Gaussians
Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationCovariance Matrix Simplification For Efficient Uncertainty Management
PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part
More informationVariational Bayes and Variational Message Passing
Variational Bayes and Variational Message Passing Mohammad Emtiyaz Khan CS,UBC Variational Bayes and Variational Message Passing p.1/16 Variational Inference Find a tractable distribution Q(H) that closely
More informationChapter 20. Deep Generative Models
Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationarxiv: v9 [stat.co] 9 May 2018
Variational Inference: A Review for Statisticians David M. Blei Department of Computer Science and Statistics Columbia University arxiv:1601.00670v9 [stat.co] 9 May 2018 Alp Kucukelbir Department of Computer
More informationVariational Methods in Bayesian Deconvolution
PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationBayesian X-ray Computed Tomography using a Three-level Hierarchical Prior Model
L. Wang, A. Mohammad-Djafari, N. Gac, MaxEnt 16, Ghent, Belgium. 1/26 Bayesian X-ray Computed Tomography using a Three-level Hierarchical Prior Model Li Wang, Ali Mohammad-Djafari, Nicolas Gac Laboratoire
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationVariational Inference: A Review for Statisticians
Variational Inference: A Review for Statisticians David M. Blei Department of Computer Science and Statistics Columbia University Alp Kucukelbir Department of Computer Science Columbia University Jon D.
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationStochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationVARIATIONAL BAYESIAN EM ALGORITHM FOR MODELING MIXTURES OF NON-STATIONARY SIGNALS IN THE TIME-FREQUENCY DOMAIN (HR-NMF)
VARIATIONAL BAYESIAN EM ALGORITHM FOR MODELING MIXTURES OF NON-STATIONARY SIGNALS IN THE TIME-FREQUENCY DOMAIN HR-NMF Roland Badeau, Angélique Drémeau Institut Mines-Telecom, Telecom ParisTech, CNRS LTCI
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationVariational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain (HR-NMF)
Variational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain HR-NMF Roland Badeau, Angélique Dremeau To cite this version: Roland Badeau, Angélique Dremeau.
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More information9 Multi-Model State Estimation
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationAalborg Universitet. Published in: IEEE International Conference on Acoustics, Speech, and Signal Processing. Creative Commons License Unspecified
Aalborg Universitet Model-based Noise PSD Estimation from Speech in Non-stationary Noise Nielsen, Jesper jær; avalekalam, Mathew Shaji; Christensen, Mads Græsbøll; Boldt, Jesper Bünsow Published in: IEEE
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationVariational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationGradient Methods for Markov Decision Processes
Gradient Methods for Markov Decision Processes Department of Computer Science University College London May 11, 212 Outline 1 Introduction Markov Decision Processes Dynamic Programming 2 Gradient Methods
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More information