Instructor: Dr. Volkan Cevher. 1. Background
|
|
- Lynn Bailey
- 6 years ago
- Views:
Transcription
1 Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These lecture notes are the seventh in a series of lecture notes taken from a course offered in the Fall of 2008 at ice University entitled Graphical Models. The course was written and instructed by Dr. Volkan Cevher in the Department of Electrical and Computer Engineering. This particular set of notes was taken by David Kahle on September 16, Introduction In the last lecture we became particularly interested in making inference on the graph displayed in Figure 1. Here Z m and X n are random vectors. For the purposes of this lecture, all Z X Figure 1. Simple observed directed graph random vectors will be assumed to exhibit densities with respect to the Lebesgue measure denoted p and subscripted by the corresponding random vector. For example, the random vector Z exhibits the density p Z (z. The joint density of the random vectors X and Z is written p X,Z (x, z. The graphical model presented above is simple in order to emphasize the fact that we wish to make inference regarding the vector Z provided we have information on the vector X. To that end, recall the crucial formula derived from the definition of conditional expectation p X,Z (x, z p Z X (z xp X (x, (1 which is sometimes referred to as the product rule. As we well know, our primary interest lies in the factor p Z X (z x. The rest of this lecture is concerned with characterizing this density using the method of variational Bayes (VB approximation. This is the second example of a deterministic scheme in approximating the conditional density for inferential procedures (the first being Laplace approximation. 1
2 2 3. Motivation and the Kullback-Leibler Divergence In this section we derive a pseudo distance metric of probability distributions known as the Kullback- Leibler divergence. The idea is that without some kind of measure of difference between probability measures, we have no benchmark for comparing the accuracy of different approximations. Our result will be the Kullback-Leibler divergence; it will soon become clear why it is called divergence instead of distance. The divergence is derived as follows. Beginning with (1 and taking logs, we obtain which rearranges to log p X,Z (x, z log p Z X (z x + log p X (x (2 log p X (x log p X,Z (x, z log p Z X (z x. (3 Now, recall the property of logs which states log a log b log a c log b c (provided everything is defined appropriately. Appealing to this fact, we introduce another density, q Z (z, which is an arbitrary probability density (also with respect to the Lebesgue measure. Then (3 grants log p X (x log p X,Z(x, z q Z (z and multiplying both sides by q Z (z we obtain q Z (z log p X (x q Z (z log p X,Z(x, z q Z (z By integrating with respect to z, we have q Z (z log p X (x dz q Z (z log p X,Z(x, z m q m Z (z log p Z X(z x, (4 q Z (z q Z (z log p Z X(z x. (5 q Z (z dz q Z (z log p Z X(z x q m Z (z dz. (6 This is the key equation for our search. To summarize it we note that the left hand side of (6 is simply log p X (x and use short hand notation for the two terms on the right hand side (with the second including the negative sign. The short hand notation is defined as L (q Z : q Z (z log p X,Z(x, z dz q m Z (z E qz log p X,Z(X, Z KL ( q Z p Z X q Z (Z : q Z (z log p Z X(z x dz q m Z (z q Z (z q Z (z log p m Z X (z x dz E qz log q Z (Z p Z X (Z X where E Π represents the expected value operator with respect to the probability measure (or equivalent Π. Thus, the fundamental relationship is concisely written log p X (x L (q Z + KL ( q Z p Z X. (7,
3 3 The functional KL ( q Z p Z X is known as the Kullback-Leibler divergence of pz X from q Z and is the pseudo metric which we are seeking. A few properties of KL are that for all valid p Z X and q Z, 1. KL ( q Z p Z X 0 and 2. KL ( q Z p Z X 0 qz p Z X a.e. Proof of both these facts is provided in Lemma 3.1 of 1. 1 However, note that the Kullback- Leibler divergence is not symmetric, and thus not a true distance metric. To conclude this section it is instructive to see the meaning of the KL divergence with an example. In figure 2 we plot an attempt to approximate a normal distribution p N(2, 0.5 with three different log-normals q1, q2, q3. Oberve that the approximation that visually seems more accurate also has the smallest KL divergence. Moreover, it is worth noting that for the case where KL(p q we have KL(q2 p Figure 2. Three different approximations. smaller the KL divergence. The better the approximation the 4. Variational Bayes Approximation ecall that the choice of q Z (z is entirely arbitrary so long as it is a probability density with respect to the Lebesgue measure. The idea in variational Bayes (VB approximation is to select a simple approximation q Z to the complicated conditional density p Z X. The Kullback-Leibler divergence gives us a measure of the discrepancy between q Z and p Z X ; so the goal is to find an approximation q Z which minimizes KL ( q Z p Z X. Further consideration of (7 in light of our new task will prove beneficial. Note that the left hand side does not vary with z; moreover, in our graph we are considering an experiment where we observe x, so the quantity is fixed. 2 It is generally referred to as the log marginal likelihood or the 1 Note that what we are labeling divergence and what Kullback and Leibler label divergence are different functionals. Our definition of KL divergence is consistent with literature, despite Kullback and Leibler defining it differently. 2 In this set of notes the upper case oman characters such as X will be used to denote random vectors while the lower case oman characters such as x (outside a density or integral equation will denote a single observation of X as is common in statistics literature.
4 4 log evidence. Since it does not vary with q Z, it is clear that the functionals L and KL are inversely related. Therefore, a minimization of KL amounts to a maximization of L, i.e., q Z : arg min q Z Q KL ( q Z p Z X arg max q Z Q L (q Z, (8 where qz is our approximation of interest and Q denotes any set of valid probability densities. We will refer to qz as the Q-VB approximation. It is also sometimes useful to note that by the first property of KL, log p X (x L (q Z, and thus e L(qZ provides a lower bound for the marginal density of X. Finding qz when Q is any probability density is in general a difficult task. To make our analysis more tractable we can impose an independence structure on the random vector Z; that is, we will only consider q Z s which come from the set Q : q Z (z : q Z (z q Zi (z i, (9 i1 where q Zi (z i is the probability density of Z i, the ith element of Z. Sometimes even this task proves difficult and more restrictions are imposed to make Q even smaller. Such techniques are referred to as restricted variational Bayes (-VB techniques. For example, we could add the additional requirement that each q Zi (z i is in the exponential family. For the rest of these notes, we will take Q as defined in (9 for our Q in (8. To find (8, we begin by looking at L (q Z. In particular, it will be beneficial to look at the jth factor q Zj. For this reason, our mathematics is aimed at separating out the factors which depend on q Zj. Presented in the equations which follow is the derivation in terms of expectations. An equivalent form of the first part of the derivation in terms of integrals is provided in the Appendix.
5 5 So, from (9 and Fubini s theorem 3 we have L (q Z E qz log p X,Z(X, Z q Z (Z E qz log p X,Z (X, Z log q Z (Z E qz log p X,Z (X, Z log q Zk (Z k E qz log p X,Z (X, Z E qz log q Zk (Z k E qzj E Q m log p X,Z (X, Z E qzj E Q m log p X,Z (X, Z E qzk log q Zk (Z k E qzj log qzj (Z j ( E qzj log exp E Q m log p X,Z (X, Z ( E qzj log exp E Q m log p X,Z (X, Z E qzj exp log E Q m log p X,Z (X, Z q Zj (Z j E qzk log q Zk (Z k E qzj log qzj (Z j log q Zj (Z j m E qzk log q Zk (Z k E qzk log q Zk (Z k E qzk log q Zk (Z k E qzj log q Zj (Z j m E qzk log q Zk (Z k exp q Zi log p X,Z (X, Z ( KL q Zj exp E Q m i1 E Q m log p X,Z (X, Z E qzk log q Zk (Z k. Now, we know from KL s first property that it is never negative. Thus, to maximize L (q Z we need to minimize the KL term in the last equation. From KL s second property, we know that it is minimized precisely when the two terms are equivalent a.e. This gives the nice formula we know must hold for the probability density qz j - the jth factor of qz - qz j (z j exp E Q m log p X,Z (X, Z, (10 where here Z Z 1,..., Z j1, z j, Z j+1,..., Z m. 3 The theorem states that if A B f(x, y d(x, y < then ` A B f(x, ydy dx B A B f(x, yd(x, y. ` A f(x, ydx dy
6 6 5. Example: A Univariate Gaussian To understand the aforementioned approximation procedure, we show the analytic computations in the case of a univariate gaussian distribution. Assume we have a data set D x 1,..., x N draw from a distribution with unknown parameters µ, τ. Given the data, the likelihood function is: p(d µ, τ ( τ N/2 exp τ 2π 2 N (x n µ 2 We may have some idea about the distribution of the unknown parameters µ, τ. To simplify the analysis here, we introduce the following conjugate priors: n1 p(µ τ N (µ µ 0, (λ 0 τ 1 p(τ Gamma(τ a 0, b 0 ecall that we are interested in estimated the posterior distribution q(µ, τ over the unknown variables. According the mean field approximation we assume it takes a factorized form (which is not how the true posterior factorizes: q(µ, τ q µ (µq τ (τ To find the optimal value for factor q µ (µ we apply the formula the we derived above: log q µ(µ E τ log p(d, µ, τ E τ logp(d µ, τp(µ τp(τ E τ log p(d µ, τ + log p(µ τ + log p(τ E τ log p(d µ, τ + log p(µ τ + const Eτ N λ 0 (µ µ (x n µ 2 + constant. 2 n1 The next step is to complete the square over µ to obtain the form of a gaussian N(µ µ N, λ 1 N for q µ (µ where: µ N λ 0µ 0 + Nx λ 0 + N, λ N (λ 0 + NEτ. A similar analysis for q τ (τ shows that it follows a gamma distribution Gamma(τ a N, b N where: a N a 0 + N + 1, 2 b N b N 2 E τ (x n µ 2 + λ 0 (µ µ 0 2. n1
7 7 The same analysis together with more sophisticated examples can be found in chapter 10 of?. It important to notice that we can use the above derived formulas to iterative compute more refined estimates of the model parameters. Observe that the paramters of q µ (µ depend on the mean value of τ and vice verca. To conlude this section we refer to the reader to two excellent references on variational methods?,?. 6. Appendix L (q Z q Z (z log p X,Z(x, z dz q m Z (z q Zi (z i log p X,Z(x, z m q Z k (z k dz 1 dz m i1 ( q Zi (z i log p X,Z (x, z log q Zk (z k dz 1 dz m i1 i1 q Zi (z i log p X,Z (x, z dz 1 dz m q Zj (z j q Zj (z j q Zj (z j q Zi (z i log q Zk (z k dz 1 dz m i1 m1 m1 q Zi (z i log p X,Z (x, z dz \j dz j i1 log q Zk (z k q Zi (z i dz 1 dz m i1 q Zi (z i log p X,Z (x, z dz \j dz j i1 log q Zk (z k q Zk (z k dz k m1 q Zi (z i log p X,Z (x, z dz \j dz j i1 log q Zj (z j q Zj (z j dz j log q Zk (z k q Zk (z k dz k eferences 1. S. Kullback and.a. Leibler, On information and sufficiency, Annals of Mathematical Statistics 22 (1951, no. 1,
13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More information14 : Mean Field Assumption
10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More information2.1 Optimization formulation of k-means
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 2: k-means Clustering Lecturer: Jiaming Xu Scribe: Jiaming Xu, September 2, 2016 Outline Optimization formulation of k-means Convergence
More informationVariational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M
A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationVariational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationEM & Variational Bayes
EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationIntegrating Correlated Bayesian Networks Using Maximum Entropy
Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationUnderstanding Covariance Estimates in Expectation Propagation
Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More information12 : Variational Inference I
10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationVariational Inference and Learning. Sargur N. Srihari
Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationCS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm
CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm Jan-Willem van de Meent, 19 November 2016 1 Hidden Markov Models A hidden Markov model (HMM)
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationConjugate Predictive Distributions and Generalized Entropies
Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationREINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationCycle-Consistent Adversarial Learning as Approximate Bayesian Inference
Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales Motivation:
More informationECE531 Lecture 4b: Composite Hypothesis Testing
ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationExpectation Propagation performs smooth gradient descent GUILLAUME DEHAENE
Expectation Propagation performs smooth gradient descent 1 GUILLAUME DEHAENE In a nutshell Problem: posteriors are uncomputable Solution: parametric approximations 2 But which one should we choose? Laplace?
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationLightweight Probabilistic Deep Networks Supplemental Material
Lightweight Probabilistic Deep Networks Supplemental Material Jochen Gast Stefan Roth Department of Computer Science, TU Darmstadt In this supplemental material we derive the recipe to create uncertainty
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSTATS 306B: Unsupervised Learning Spring Lecture 3 April 7th
STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More information