Approximate Bayesian inference

Size: px
Start display at page:

Download "Approximate Bayesian inference"

Transcription

1 Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth

2 1 Exchange rate data Month

3 Image data 2

4 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo

5 4 Bayesian inference y - data x - latent (unknown) variables p(x, y) - probabilistic (generative) model Posterior distribution: p(x y) = p(x, y) p(y) Expectations: E p(x y) [ϕ(x)] = ϕ(x)p(x y)dx

6 5 Examples of probabilistic generative models Exchangeable data models T p(x, y 1:T ) = µ(x) g(y t x) t=1 Applications: Regression, classification,... State space models T T p(x 1:T, y 1:T ) µ(x 1 ) f(x t x t 1 ) g(y t x t ) t=2 t=1 Applications: Filtering, smoothing, forecasting,...

7 6 Approximate Bayesian inference Because p(x y) is typically intractable we approximate: Variational methods - Turn integration into optimization E p(x y) [ϕ(x)] E q(x ; λ ) [ϕ(x)] Monte Carlo methods - Random sampling E p(x y) [ϕ(x)] 1 N N ϕ(x i ), x i p(x y) i=1

8 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo

9 8 Variational inference q(x ; λ) λ p(x y) KL(q(x ; λ) p(x y)) λ init

10 9 Evidence lower bound KL(q(x ; λ) p(x y)) = log p(y) E q [log p(x, y) log q(x ; λ)] log p(y) E q [log p(x, y) log q(x ; λ)] }{{} L(λ) Maximizing the ELBO L(λ) minimizes KL divergence between q(x ; λ) and p(x y) Coordinate ascent when conditionally conjugate Stochastic optimization using noisy λ L(λ) estimated with Monte Carlo methods

11 10 Bayesian logistic regression Data are pairs (y t, z t ) z t is a covariate y t is binary label θ is regression coefficient The generative model is p(θ, y 1:T z 1:T ) = N (θ ; 0, 1) T Bernoulli (y t ; σ(θz t )), t=1 where σ( ) is the logistic function mapping reals to (0, 1).

12 11 VI for Bayesian logistic regression Let s consider a single data pair (y, z) Variational approximation q(θ ; λ) = N (θ ; µ, σ 2 ) The ELBO is given by L(µ, σ 2 [ ) = E q log p(θ) + log p(y θ, z) log q(θ ; µ, σ 2 ) ]

13 12 VI for Bayesian logistic regression L(µ, σ 2 [ ) = E q log p(θ) + log p(y θ, z) log q(θ ; µ, σ 2 ) ] = 1 2 (µ2 + σ 2 ) log σ2 + E q [log p(y θ, z)] + const. = 1 2 (µ2 + σ 2 ) log σ2 + E q [yzθ log (1 + exp(zθ))] = 1 2 (µ2 + σ 2 ) log σ2 + yzµ E q [log (1 + exp(zθ))] We are stuck because of the expectation. What can we do?

14 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo

15 14 Stochastic Optimization Gradient-based optimization: L(λ) = 0 Stochastic approximation: [Robbins and Monro, 1951] [ ] with E L(λ k ) = L(λ k ). λ k+1 = λ k + α k L(λ k ),

16 15 VI for Bayesian logistic regression L(µ, σ 2 ) = 1 2 (µ2 + σ 2 ) log σ2 + yzµ E q [log (1 + exp(zθ))] E q [log (1 + exp(zθ))] = q(θ ; µ, σ 2 ) log (1 + exp(zθ)) dθ [ = E q log q(θ ; µ, σ 2 ) log (1 + exp(zθ)) ] 1 N log q(θ i ; µ, σ 2 ) log ( 1 + exp(zθ i ) ) N L(µ, σ 2 ), i=1 with θ i q(θ ; µ, σ 2 ). Unbiased MC estimator of the gradient!

17 16 Stochastic gradient variational inference Monte Carlo estimates of gradients to optimize the ELBO. Score function: [Paisley et al., 2012, Ranganath et al., 2014] L(λ) = E q [log p(x, y) log q(x ; λ)] = E q [ log q(x ; λ) (log p(x, y) log q(x ; λ))] Reparameterization: [Kingma & Welling 2014, Rezende et al, 2014] L(λ) = E q [log p(x, y) log q(x ; λ)] = E s(ε) [log p(h(ε, λ), y) log q(h(ε, λ) ; λ)] = E s(ε) [ x (log p(x, y) log q(x ; λ)) λ h(ε, λ) ] e.g. x = h(ε, µ, σ) = µ + σε, s(ε) = N (0, 1).

18 17 Reparameterization vs. score function ELBO Time [s] ADVI BBVI G REP RSVI

19 18 Why doesn t reparameterization always work? All random variables we simulate on our computers are ultimately transformations of uniforms. Why can t we do reparameterization gradients for everything? Often transformations h( ) used in practice are not differentiable.

20 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo

21 20 Reparameterizing gamma Gamma(λ, 1): Common distribution in VI Used to sample beta, Dirichlet, Student s t, chi-squared,... Simulation: Propose h(ε, λ) = Accept reject ( λ 1 ) ( ε 9λ 3 ) 3, ε N (0, 1) u a(ε, λ)? u U[0, 1] Tricky to reparameterize because of the accept reject step! A simple method for generating gamma variables, Marsaglia & Tsang, 2000

22 21 Our strategy: partial reparameterization Idea: Use ε π(ε ; λ) (only weakly dependent on λ) s.t. x = h(ε, λ) q(x ; λ) Then: L(λ) = E π(ε ; λ) [log p(h(ε, λ), y) log q(h(ε, λ) ; λ)] = E π(ε ; λ) [ x (log p(x, y) log q(x ; λ)) λ h(ε, λ) ] }{{} reparameterization + E π(ε ; λ) [(log p(h(ε, λ), y) log q(h(ε, λ) ; λ)) λ log π(ε ; λ)] }{{} correction Intuition: Interpolating between score function gradients and reparameterization gradients.

23 22 Rejection sampling and reparameterization Idea: Use h(ε, λ) from the proposal Let π(ε ; λ) be the distribution of the accepted ε Rejection sampling: Propose x = h(ε, λ), ε s(ε) Accept if u a(ε, λ) = Then: 1 q(h(ε,λ) ; λ) dh (ε,λ) dε π(ε ; λ) = π(ε, u ; λ)du = M λ s(ε) 0 = q(h(ε, λ) ; λ) dh (ε, λ) dε M λ s(ε), u U[0, 1] 1 0 1[u a(ε, λ)]du

24 23 Sparse gamma deep exponential family zn;l;k z KL w k,k Gamma(0.1, 0.1) ( ) x l 0.1 n,k Gamma 0.1, k wl k,k x l+1 n,k ( ) w`+1,k w`,k zn;`c1;k zn;`;k K`C1 K` z y n,d Poisson k w 0 k,dx 1 n,k w1,k zn;1;k K1 w0,i xn;i V x Deep exponential families, Ranganath et al., 2015 N

25 24 Sparse gamma DEF Olivetti faces ELBO Time [s] ADVI BBVI G REP RSVI [Kucukelbir et al., 2015, Ranganath et al., 2014, Ruiz et al., 2016]

26 25 Variational Monte Carlo methods Monte Carlo methods can enable variational inference for more complex (non-conjugate) models p(x, y) But what if we can use Monte Carlo methods to also define more flexible q(x ; λ)? I will illustrate how we can do this using sequential Monte Carlo for the state space model: T T p(x 1:T, y 1:T ) µ(x 1 ) f(x t x t 1 ) g(y t x t ) t=2 t=1

27 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo

28 27 Sequential Monte Carlo SMC Approximation: p(x 1:t 1 y 1:t 1 ) = Update: N i=1 a i t 1 Discrete ( w j t 1/ l wl t 1 x i t r(x t x ai t 1 t 1 ; λ) w i t = f(xi t xa i t 1 t 1 )g(y t xi t ) r(x i t xai t 1 t 1 ; λ) wt 1 i δ x i l wl 1:t 1 t 1 ) particle particle time SIS time

29 28 Variational Sequential Monte Carlo Key idea: q(x 1:T ; λ) defined by running SMC and sample p(x 1:T y 1:T ) Density p(x y) r(x;λ ) q(x;λ ) x

30 29 Variational Sequential Monte Carlo Generative Process VSMC 1. Run SMC to get {( x i 1:T, )} N wi T i=1 2. b T Discrete ( wt/ i ) l wl T 3. return x 1:T x b T 1:T particle N x 1:T p(x 1:T y 1:T ) = wi T δ x i i=1 l wl 1:T T time

31 30 Distribution of VSMC The distribution q is the marginal of x b T 1:T over all random variables, x 1:N 1:T, a1:n 1:T 1, b T, generated by VSMC ] q(x 1:T y 1:T ; λ) = p(x 1:T, y 1:T ) E [ p(y 1:T ) 1 x1:t, where p(y 1:T ) T t=1 1 N N i=1 wi t.

32 31 Surrogate ELBO We can not directly use the standard ELBO, L(λ), because evaluating q pointwise is intractable. Consider instead: [ T ( L(λ) E log t=1 1 N N i=1 w i t )] = E [log p(y 1:T )] Theorem 1 (Surrogate ELBO). log p(y 1:T ) L(λ) L(λ). Proof. (Idea) Use the definition of L(λ) for q(x 1:T y 1:T ; λ) and then Jensen s inequality.

33 32 Stochastic Optimization We optimize the surrogate ELBO L(λ) using stochastic gradient ascent, L(λ) [ = E [log p(y 1:T )] = E log p(y 1:T ) + log p(y 1:T ) log φ ]. Variance reduction using the reparameterization trick, Rao-Blackwellization, and control variates.

34 33 Theoretical Results For some c(λ) <, With N = bt, ( ) KL q(x 1:T ; λ) p(x 1:T y 1:T ) ( ) KL q(x 1:T ; λ) p(x 1:T y 1:T ) T σ 2 (λ), 2b for σ 2 (λ) <. c(λ) N. [ E log p(y 1:T ) ] p(y 1:T )

35 34 Experiments - Stochastic volatility Data: 10 years of monthly returns for the exchange rate of 22 currencies wrt to USD N = 4 N = 8 N = 16 x t = µ + φ(x t 1 µ) + v t, v t N (0, Q) ( xt ) y t = β exp e t, e t N (0, I) 2 Method ELBO Structured VI IWAE VSMC IWAE VSMC IWAE VSMC ELBO VSMC IWAE N r(x t x t 1 ; λ, θ) f(x t x t 1 ; θ) N (x t ; µ t, Σ t )

36 35 Experiments - Stochastic recurrent neural networks Data: 105 motor cortex neurons simultaneously recorded in a macaque monkey 1080 x t = µ θ (x t 1 ) + exp ( σ θ (x t 1 )/2 ) v t, y t Poisson (exp (η θ (x t ))), ELBO Iterations (10 3 ) IWAE d x = 3 IWAE d x = 5 IWAE d x = 10 VSMC d x = 3 VSMC d x = 5 VSMC d x = 10 Proposal: r(x t x t 1, y t ) f(x t x t 1 ; θ) ( ) N x t ; µ λ (y t ), e 2σ λ(y t ), µ λ ( ), σ λ ( ) are inference networks.

37 36 Collaborators Scott Linderman Francisco Ruiz Rajesh Ranganath David Blei Work done while visiting Columbia University Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms, Naesseth et al., 2017 Variational Sequential Monte Carlo, Naesseth et al., 2017

38 37 Take-home messages Stochastic optimization opens up for interesting new approximate Bayesian inference methods Monte Carlo methods more flexible variational approximations with guarantees Variational methods adaptation of Monte Carlo methods based on a global objective Contact: christian.a.naesseth@liu.se Website: users.isy.liu.se/en/rt/chran60/index.html

39 38 Subsampling and Variational EM Data subsampling y 1:T = {y j 1:T }n j=1 L(λ) = n j=1 ] E [log p(y j1:t ) At each iteration sample a mini-batch of data. Variational EM log p θ (y 1:T ) L(λ, θ) = E [log p θ (y 1:T )] Maximize L(λ, θ) jointly with respect to λ, θ

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling

Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18 Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Ximing Li, Changchun

More information

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Variational Sequential Monte Carlo

Variational Sequential Monte Carlo Variational Sequential Monte Carlo Christian A. Naesseth Scott W. Linderman Rajesh Ranganath David M. Blei Linköping University Columbia University New York University Columbia University Abstract Many

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Natural Gradients via the Variational Predictive Distribution

Natural Gradients via the Variational Predictive Distribution Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference

More information

arxiv: v1 [stat.ml] 18 Oct 2016

arxiv: v1 [stat.ml] 18 Oct 2016 Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University of Cambridge arxiv:1610.05683v1 [stat.ml] 18 Oct 2016 Abstract Variational

More information

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,

More information

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational

More information

arxiv: v1 [stat.ml] 2 Mar 2016

arxiv: v1 [stat.ml] 2 Mar 2016 Automatic Differentiation Variational Inference Alp Kucukelbir Data Science Institute, Department of Computer Science Columbia University arxiv:1603.00788v1 [stat.ml] 2 Mar 2016 Dustin Tran Department

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Advances in Variational Inference

Advances in Variational Inference 1 Advances in Variational Inference Cheng Zhang Judith Bütepage Hedvig Kjellström Stephan Mandt arxiv:1711.05597v1 [cs.lg] 15 Nov 2017 Abstract Many modern unsupervised or semi-supervised machine learning

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Evaluating the Variance of

Evaluating the Variance of Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Automatic Differentiation Variational Inference

Automatic Differentiation Variational Inference Journal of Machine Learning Research X (2016) 1-XX Submitted 01/16; Published X/16 Automatic Differentiation Variational Inference Alp Kucukelbir Data Science Institute and Department of Computer Science

More information

arxiv: v2 [stat.ml] 15 Aug 2017

arxiv: v2 [stat.ml] 15 Aug 2017 Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

The Generalized Reparameterization Gradient

The Generalized Reparameterization Gradient The Generalized Reparameterization Gradient Francisco J. R. Ruiz University of Cambridge Columbia University Michalis K. Titsias Athens University of Economics and Business David M. Blei Columbia University

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Automatic Differentiation Variational Inference

Automatic Differentiation Variational Inference Journal of Machine Learning Research X (2016) 1-XX Submitted X/16; Published X/16 Automatic Differentiation Variational Inference Alp Kucukelbir Data Science Institute, Department of Computer Science Columbia

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Latent state estimation using control theory

Latent state estimation using control theory Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given

More information

Interpretable Latent Variable Models

Interpretable Latent Variable Models Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Overdispersed Black-Box Variational Inference

Overdispersed Black-Box Variational Inference Overdispersed Black-Box Variational Inference Francisco J. R. Ruiz Data Science Institute Dept. of Computer Science Columbia University Michalis K. Titsias Dept. of Informatics Athens University of Economics

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Quasi-Monte Carlo Flows

Quasi-Monte Carlo Flows Quasi-Monte Carlo Flows Florian Wenzel TU Kaiserslautern Germany wenzelfl@hu-berlin.de Alexander Buchholz ENSAE-CREST, Paris France alexander.buchholz@ensae.fr Stephan Mandt Univ. of California, Irvine

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Automatic Variational Inference in Stan

Automatic Variational Inference in Stan Automatic Variational Inference in Stan Alp Kucukelbir Columbia University alp@cs.columbia.edu Andrew Gelman Columbia University gelman@stat.columbia.edu Rajesh Ranganath Princeton University rajeshr@cs.princeton.edu

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Variational Inference for Monte Carlo Objectives

Variational Inference for Monte Carlo Objectives Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend

More information