Approximate Bayesian inference
|
|
- Marylou Holland
- 5 years ago
- Views:
Transcription
1 Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth
2 1 Exchange rate data Month
3 Image data 2
4 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo
5 4 Bayesian inference y - data x - latent (unknown) variables p(x, y) - probabilistic (generative) model Posterior distribution: p(x y) = p(x, y) p(y) Expectations: E p(x y) [ϕ(x)] = ϕ(x)p(x y)dx
6 5 Examples of probabilistic generative models Exchangeable data models T p(x, y 1:T ) = µ(x) g(y t x) t=1 Applications: Regression, classification,... State space models T T p(x 1:T, y 1:T ) µ(x 1 ) f(x t x t 1 ) g(y t x t ) t=2 t=1 Applications: Filtering, smoothing, forecasting,...
7 6 Approximate Bayesian inference Because p(x y) is typically intractable we approximate: Variational methods - Turn integration into optimization E p(x y) [ϕ(x)] E q(x ; λ ) [ϕ(x)] Monte Carlo methods - Random sampling E p(x y) [ϕ(x)] 1 N N ϕ(x i ), x i p(x y) i=1
8 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo
9 8 Variational inference q(x ; λ) λ p(x y) KL(q(x ; λ) p(x y)) λ init
10 9 Evidence lower bound KL(q(x ; λ) p(x y)) = log p(y) E q [log p(x, y) log q(x ; λ)] log p(y) E q [log p(x, y) log q(x ; λ)] }{{} L(λ) Maximizing the ELBO L(λ) minimizes KL divergence between q(x ; λ) and p(x y) Coordinate ascent when conditionally conjugate Stochastic optimization using noisy λ L(λ) estimated with Monte Carlo methods
11 10 Bayesian logistic regression Data are pairs (y t, z t ) z t is a covariate y t is binary label θ is regression coefficient The generative model is p(θ, y 1:T z 1:T ) = N (θ ; 0, 1) T Bernoulli (y t ; σ(θz t )), t=1 where σ( ) is the logistic function mapping reals to (0, 1).
12 11 VI for Bayesian logistic regression Let s consider a single data pair (y, z) Variational approximation q(θ ; λ) = N (θ ; µ, σ 2 ) The ELBO is given by L(µ, σ 2 [ ) = E q log p(θ) + log p(y θ, z) log q(θ ; µ, σ 2 ) ]
13 12 VI for Bayesian logistic regression L(µ, σ 2 [ ) = E q log p(θ) + log p(y θ, z) log q(θ ; µ, σ 2 ) ] = 1 2 (µ2 + σ 2 ) log σ2 + E q [log p(y θ, z)] + const. = 1 2 (µ2 + σ 2 ) log σ2 + E q [yzθ log (1 + exp(zθ))] = 1 2 (µ2 + σ 2 ) log σ2 + yzµ E q [log (1 + exp(zθ))] We are stuck because of the expectation. What can we do?
14 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo
15 14 Stochastic Optimization Gradient-based optimization: L(λ) = 0 Stochastic approximation: [Robbins and Monro, 1951] [ ] with E L(λ k ) = L(λ k ). λ k+1 = λ k + α k L(λ k ),
16 15 VI for Bayesian logistic regression L(µ, σ 2 ) = 1 2 (µ2 + σ 2 ) log σ2 + yzµ E q [log (1 + exp(zθ))] E q [log (1 + exp(zθ))] = q(θ ; µ, σ 2 ) log (1 + exp(zθ)) dθ [ = E q log q(θ ; µ, σ 2 ) log (1 + exp(zθ)) ] 1 N log q(θ i ; µ, σ 2 ) log ( 1 + exp(zθ i ) ) N L(µ, σ 2 ), i=1 with θ i q(θ ; µ, σ 2 ). Unbiased MC estimator of the gradient!
17 16 Stochastic gradient variational inference Monte Carlo estimates of gradients to optimize the ELBO. Score function: [Paisley et al., 2012, Ranganath et al., 2014] L(λ) = E q [log p(x, y) log q(x ; λ)] = E q [ log q(x ; λ) (log p(x, y) log q(x ; λ))] Reparameterization: [Kingma & Welling 2014, Rezende et al, 2014] L(λ) = E q [log p(x, y) log q(x ; λ)] = E s(ε) [log p(h(ε, λ), y) log q(h(ε, λ) ; λ)] = E s(ε) [ x (log p(x, y) log q(x ; λ)) λ h(ε, λ) ] e.g. x = h(ε, µ, σ) = µ + σε, s(ε) = N (0, 1).
18 17 Reparameterization vs. score function ELBO Time [s] ADVI BBVI G REP RSVI
19 18 Why doesn t reparameterization always work? All random variables we simulate on our computers are ultimately transformations of uniforms. Why can t we do reparameterization gradients for everything? Often transformations h( ) used in practice are not differentiable.
20 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo
21 20 Reparameterizing gamma Gamma(λ, 1): Common distribution in VI Used to sample beta, Dirichlet, Student s t, chi-squared,... Simulation: Propose h(ε, λ) = Accept reject ( λ 1 ) ( ε 9λ 3 ) 3, ε N (0, 1) u a(ε, λ)? u U[0, 1] Tricky to reparameterize because of the accept reject step! A simple method for generating gamma variables, Marsaglia & Tsang, 2000
22 21 Our strategy: partial reparameterization Idea: Use ε π(ε ; λ) (only weakly dependent on λ) s.t. x = h(ε, λ) q(x ; λ) Then: L(λ) = E π(ε ; λ) [log p(h(ε, λ), y) log q(h(ε, λ) ; λ)] = E π(ε ; λ) [ x (log p(x, y) log q(x ; λ)) λ h(ε, λ) ] }{{} reparameterization + E π(ε ; λ) [(log p(h(ε, λ), y) log q(h(ε, λ) ; λ)) λ log π(ε ; λ)] }{{} correction Intuition: Interpolating between score function gradients and reparameterization gradients.
23 22 Rejection sampling and reparameterization Idea: Use h(ε, λ) from the proposal Let π(ε ; λ) be the distribution of the accepted ε Rejection sampling: Propose x = h(ε, λ), ε s(ε) Accept if u a(ε, λ) = Then: 1 q(h(ε,λ) ; λ) dh (ε,λ) dε π(ε ; λ) = π(ε, u ; λ)du = M λ s(ε) 0 = q(h(ε, λ) ; λ) dh (ε, λ) dε M λ s(ε), u U[0, 1] 1 0 1[u a(ε, λ)]du
24 23 Sparse gamma deep exponential family zn;l;k z KL w k,k Gamma(0.1, 0.1) ( ) x l 0.1 n,k Gamma 0.1, k wl k,k x l+1 n,k ( ) w`+1,k w`,k zn;`c1;k zn;`;k K`C1 K` z y n,d Poisson k w 0 k,dx 1 n,k w1,k zn;1;k K1 w0,i xn;i V x Deep exponential families, Ranganath et al., 2015 N
25 24 Sparse gamma DEF Olivetti faces ELBO Time [s] ADVI BBVI G REP RSVI [Kucukelbir et al., 2015, Ranganath et al., 2014, Ruiz et al., 2016]
26 25 Variational Monte Carlo methods Monte Carlo methods can enable variational inference for more complex (non-conjugate) models p(x, y) But what if we can use Monte Carlo methods to also define more flexible q(x ; λ)? I will illustrate how we can do this using sequential Monte Carlo for the state space model: T T p(x 1:T, y 1:T ) µ(x 1 ) f(x t x t 1 ) g(y t x t ) t=2 t=1
27 1 Bayesian inference 2 Variational inference 3 Stochastic gradient variational inference 4 Rejection sampling variational inference 5 Variational sequential Monte Carlo
28 27 Sequential Monte Carlo SMC Approximation: p(x 1:t 1 y 1:t 1 ) = Update: N i=1 a i t 1 Discrete ( w j t 1/ l wl t 1 x i t r(x t x ai t 1 t 1 ; λ) w i t = f(xi t xa i t 1 t 1 )g(y t xi t ) r(x i t xai t 1 t 1 ; λ) wt 1 i δ x i l wl 1:t 1 t 1 ) particle particle time SIS time
29 28 Variational Sequential Monte Carlo Key idea: q(x 1:T ; λ) defined by running SMC and sample p(x 1:T y 1:T ) Density p(x y) r(x;λ ) q(x;λ ) x
30 29 Variational Sequential Monte Carlo Generative Process VSMC 1. Run SMC to get {( x i 1:T, )} N wi T i=1 2. b T Discrete ( wt/ i ) l wl T 3. return x 1:T x b T 1:T particle N x 1:T p(x 1:T y 1:T ) = wi T δ x i i=1 l wl 1:T T time
31 30 Distribution of VSMC The distribution q is the marginal of x b T 1:T over all random variables, x 1:N 1:T, a1:n 1:T 1, b T, generated by VSMC ] q(x 1:T y 1:T ; λ) = p(x 1:T, y 1:T ) E [ p(y 1:T ) 1 x1:t, where p(y 1:T ) T t=1 1 N N i=1 wi t.
32 31 Surrogate ELBO We can not directly use the standard ELBO, L(λ), because evaluating q pointwise is intractable. Consider instead: [ T ( L(λ) E log t=1 1 N N i=1 w i t )] = E [log p(y 1:T )] Theorem 1 (Surrogate ELBO). log p(y 1:T ) L(λ) L(λ). Proof. (Idea) Use the definition of L(λ) for q(x 1:T y 1:T ; λ) and then Jensen s inequality.
33 32 Stochastic Optimization We optimize the surrogate ELBO L(λ) using stochastic gradient ascent, L(λ) [ = E [log p(y 1:T )] = E log p(y 1:T ) + log p(y 1:T ) log φ ]. Variance reduction using the reparameterization trick, Rao-Blackwellization, and control variates.
34 33 Theoretical Results For some c(λ) <, With N = bt, ( ) KL q(x 1:T ; λ) p(x 1:T y 1:T ) ( ) KL q(x 1:T ; λ) p(x 1:T y 1:T ) T σ 2 (λ), 2b for σ 2 (λ) <. c(λ) N. [ E log p(y 1:T ) ] p(y 1:T )
35 34 Experiments - Stochastic volatility Data: 10 years of monthly returns for the exchange rate of 22 currencies wrt to USD N = 4 N = 8 N = 16 x t = µ + φ(x t 1 µ) + v t, v t N (0, Q) ( xt ) y t = β exp e t, e t N (0, I) 2 Method ELBO Structured VI IWAE VSMC IWAE VSMC IWAE VSMC ELBO VSMC IWAE N r(x t x t 1 ; λ, θ) f(x t x t 1 ; θ) N (x t ; µ t, Σ t )
36 35 Experiments - Stochastic recurrent neural networks Data: 105 motor cortex neurons simultaneously recorded in a macaque monkey 1080 x t = µ θ (x t 1 ) + exp ( σ θ (x t 1 )/2 ) v t, y t Poisson (exp (η θ (x t ))), ELBO Iterations (10 3 ) IWAE d x = 3 IWAE d x = 5 IWAE d x = 10 VSMC d x = 3 VSMC d x = 5 VSMC d x = 10 Proposal: r(x t x t 1, y t ) f(x t x t 1 ; θ) ( ) N x t ; µ λ (y t ), e 2σ λ(y t ), µ λ ( ), σ λ ( ) are inference networks.
37 36 Collaborators Scott Linderman Francisco Ruiz Rajesh Ranganath David Blei Work done while visiting Columbia University Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms, Naesseth et al., 2017 Variational Sequential Monte Carlo, Naesseth et al., 2017
38 37 Take-home messages Stochastic optimization opens up for interesting new approximate Bayesian inference methods Monte Carlo methods more flexible variational approximations with guarantees Variational methods adaptation of Monte Carlo methods based on a global objective Contact: christian.a.naesseth@liu.se Website: users.isy.liu.se/en/rt/chran60/index.html
39 38 Subsampling and Variational EM Data subsampling y 1:T = {y j 1:T }n j=1 L(λ) = n j=1 ] E [log p(y j1:t ) At each iteration sample a mini-batch of data. Variational EM log p θ (y 1:T ) L(λ, θ) = E [log p θ (y 1:T )] Maximize L(λ, θ) jointly with respect to λ, θ
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationVariance Reduction in Black-box Variational Inference by Adaptive Importance Sampling
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18 Variance Reduction in Black-box Variational Inference by Adaptive Importance Sampling Ximing Li, Changchun
More informationReparameterization Gradients through Acceptance-Rejection Sampling Algorithms
Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationReparameterization Gradients through Acceptance-Rejection Sampling Algorithms
Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationVariational Sequential Monte Carlo
Variational Sequential Monte Carlo Christian A. Naesseth Scott W. Linderman Rajesh Ranganath David M. Blei Linköping University Columbia University New York University Columbia University Abstract Many
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationNatural Gradients via the Variational Predictive Distribution
Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference
More informationarxiv: v1 [stat.ml] 18 Oct 2016
Christian A. Naesseth Francisco J. R. Ruiz Scott W. Linderman David M. Blei Linköping University Columbia University University of Cambridge arxiv:1610.05683v1 [stat.ml] 18 Oct 2016 Abstract Variational
More informationLocal Expectation Gradients for Doubly Stochastic. Variational Inference
Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationarxiv: v1 [stat.ml] 2 Mar 2016
Automatic Differentiation Variational Inference Alp Kucukelbir Data Science Institute, Department of Computer Science Columbia University arxiv:1603.00788v1 [stat.ml] 2 Mar 2016 Dustin Tran Department
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationAdvances in Variational Inference
1 Advances in Variational Inference Cheng Zhang Judith Bütepage Hedvig Kjellström Stephan Mandt arxiv:1711.05597v1 [cs.lg] 15 Nov 2017 Abstract Many modern unsupervised or semi-supervised machine learning
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationCombine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning
Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationEvaluating the Variance of
Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationAutomatic Differentiation Variational Inference
Journal of Machine Learning Research X (2016) 1-XX Submitted 01/16; Published X/16 Automatic Differentiation Variational Inference Alp Kucukelbir Data Science Institute and Department of Computer Science
More informationarxiv: v2 [stat.ml] 15 Aug 2017
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationThe Generalized Reparameterization Gradient
The Generalized Reparameterization Gradient Francisco J. R. Ruiz University of Cambridge Columbia University Michalis K. Titsias Athens University of Economics and Business David M. Blei Columbia University
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationAutomatic Differentiation Variational Inference
Journal of Machine Learning Research X (2016) 1-XX Submitted X/16; Published X/16 Automatic Differentiation Variational Inference Alp Kucukelbir Data Science Institute, Department of Computer Science Columbia
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationLatent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationNote 1: Varitional Methods for Latent Dirichlet Allocation
Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationAn Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University
An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationREINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationLatent state estimation using control theory
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given
More informationInterpretable Latent Variable Models
Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationOverdispersed Black-Box Variational Inference
Overdispersed Black-Box Variational Inference Francisco J. R. Ruiz Data Science Institute Dept. of Computer Science Columbia University Michalis K. Titsias Dept. of Informatics Athens University of Economics
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationQuasi-Monte Carlo Flows
Quasi-Monte Carlo Flows Florian Wenzel TU Kaiserslautern Germany wenzelfl@hu-berlin.de Alexander Buchholz ENSAE-CREST, Paris France alexander.buchholz@ensae.fr Stephan Mandt Univ. of California, Irvine
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More information17 : Optimization and Monte Carlo Methods
10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationAutomatic Variational Inference in Stan
Automatic Variational Inference in Stan Alp Kucukelbir Columbia University alp@cs.columbia.edu Andrew Gelman Columbia University gelman@stat.columbia.edu Rajesh Ranganath Princeton University rajeshr@cs.princeton.edu
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationVariational Inference for Monte Carlo Objectives
Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend
More information