Integrated Non-Factorized Variational Inference
|
|
- Aubrey Shelton
- 5 years ago
- Views:
Transcription
1 Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
2 Overview World Graphical Models Posterior Inference 0.6 MCMC 0.6 VB 0.6 INF VB θ 1 θ 1 θ θ 2 MCMC θ 2 VB θ 2 Our method S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
3 For full posterior inference, our method is A fast deterministic alternative to MCMC More accurate than mean-field variational Bayes (VB) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
4 Outline Introduction Integrated Nested Laplace Approximation (INLA) Integrated Non-Factorized Variational Bayes (INF-VB) Applications Summary S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
5 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: Hyperparameters: x p(x θ) θ p(θ) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
6 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: Hyperparameters: x p(x θ) θ p(θ) Posterior inference: p(x, θ y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
7 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: x p(x θ) Hyperparameters: θ p(θ) Posterior inference: p(x, θ y) p(x y) p(θ y) p(x θ, y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
8 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: x p(x θ) Hyperparameters: θ p(θ) Posterior inference: p(x, θ y) p(x y) p(θ y) + p(x θ, y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
9 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: x p(x θ) Hyperparameters: θ p(θ) Posterior inference: p(x, θ y) p(x y) p(θ y) + p(x θ, y) The exact joint posterior p(x, θ y) = p(y, x, θ) p(y) = p(y x, θ)p(x θ)p(θ) p(y x, θ)p(x θ)p(θ)dxdθ can be difficult to evaluate. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
10 Approximate Posterior Inference Sampling based methods: Markov chain Monte Carlo (MCMC) Deterministic alternatives: Laplace approximation (LA) Variational inference Expectation propagation (EP) Integrated nested Laplace approximation (INLA) 1 1 Rue et al., 2009 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
11 Outline Introduction Integrated Nested Laplace Approximation (INLA) Integrated Non-Factorized Variational Bayes (INF-VB) Applications Summary S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
12 INLA in a Nutshell (1/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: θ k G 2 Kass & Steffey, 1989 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
13 INLA in a Nutshell (1/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q G (x y, θ k ) θ k G 2 Kass & Steffey, 1989 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
14 INLA in a Nutshell (1/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q G (x y, θ k ) θ k G 1. Laplace approximation 2 : q G (x y, θ k ) = N (x; x (θ k ), H(x (θ k )) 1 ), θ k G where x (θ k ) = argmax x p(x y, θ k ) is the posterior mode, and H(x (θ k )) is the Hessian matrix of the log posterior evaluated at the mode. 2 Kass & Steffey, 1989 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
15 INLA in a Nutshell (2/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q LA (θ y) q G (x y, θ k ) θ k G 3 Tierney & Kadane, 1986 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
16 INLA in a Nutshell (2/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: According to the Bayes rule, q LA (θ y) q G (x y, θ k ) θ k G p(θ y) = p(x, y, θ), x (1) p(y)p(x y, θ) 3 Tierney & Kadane, 1986 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
17 INLA in a Nutshell (2/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: According to the Bayes rule, q LA (θ y) q G (x y, θ k ) θ k G p(θ y) = p(x, y, θ), x (1) p(y)p(x y, θ) 2. Laplace s method of integration 3 : p(x, y, θ) q LA (θ y) = p(y)q G (x y, θ) x=x (θ) (2) 3 Tierney & Kadane, 1986 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
18 INLA in a Nutshell (3/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q(x y) q LA (θ y) + q G (x y, θ k ) θ k G S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
19 INLA in a Nutshell (3/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q(x y) q LA (θ y) + q G (x y, θ k ) θ k G 3. Numerical integration: q(x y) = k q G (x y, θ k )q LA (θ k y) k with area weights k. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
20 INLA in a Nutshell (3/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q(x, θ y) q(x y) q LA (θ y) + q G (x y, θ k ) θ k G 3. Numerical integration: q(x y) = k q G (x y, θ k )q LA (θ k y) k with area weights k. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
21 INLA in a Nutshell (3/3) Benefits: 1. Preserves full posterior dependencies (i.e. joint density q(x, θ y)) 2. Computationally efficient (MCMC: hours or days, INLA: seconds or minutes) Limitations: 1. Applies only to latent Gaussian models (LGMs) 2. No quantization for the accuracy of approximation q(x, θ y) 3. The dimension of θ has to be no more than 5 or 6 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
22 INLA in a Nutshell (3/3) Benefits: 1. Preserves full posterior dependencies (i.e. joint density q(x, θ y)) 2. Computationally efficient (MCMC: hours or days, INLA: seconds or minutes) Limitations: 1. Applies only to latent Gaussian models (LGMs) 2. No quantization for the accuracy of approximation q(x, θ y) 3. The dimension of θ has to be no more than 5 or 6 Our method addresses the first two limitations with INLA. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
23 Outline Introduction Integrated Nested Laplace Approximation (INLA) Integrated Non-Factorized Variational Bayes (INF-VB) Applications Future Research S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
24 Variational Inference Variational inference turns Bayesian inference into optimization. min KL[q(x, θ y) p(x, θ y)] s.t. q(x, θ y) Q (3) q(x,θ y) Evidence lower bound (ELBO): Applying Jensen s inequality, p(y, x, θ) ln p(y) = ln q(x, θ y) q(x, θ y) dxdθ p(y, x, θ) q(x, θ y)ln dxdθ := L (4) q(x, θ y) The Jensen s gap: ln p(y) L = KL(q(x, θ y) p(x, θ y)) The variational distribution q(x, θ y) is commonly restricted to tractable families Q S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
25 Mean-Field Variational Bayes (VB) Assumes factorized form: q(x, θ y) = q(x)q(θ), then Remarks: q (x, θ y) = argmin KL(q(x, θ y) p(x, θ y)) q(x,θ y) = argmin q(x),q(θ) q(x)q(θ) ln q(x)q(θ) p(x, θ y) dxdθ Easily derived and in close form for conjugate models Challenging for non-conjugate models Ignores posterior dependencies and impairs the accuracy A poor approximation for a multi-modal distribution S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
26 Mean-Field Variational Bayes (VB) Assumes factorized form: q(x, θ y) = q(x)q(θ), then Remarks: q (x, θ y) = argmin KL(q(x, θ y) p(x, θ y)) q(x,θ y) = argmin q(x),q(θ) q(x)q(θ) ln q(x)q(θ) p(x, θ y) dxdθ Easily derived and in close form for conjugate models Challenging for non-conjugate models Ignores posterior dependencies and impairs the accuracy A poor approximation for a multi-modal distribution Our non-factorized variational method addresses these issues with mean-field VB. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
27 Hybrid Continuous-Discrete Family Consider non-factorized form: q(x, θ y) = q(x y, θ)q d (θ y) (5) x and θ are still coupling 1. The continuous approximation q(x y, θ) is very flexible Gaussian Mean-Field 2. The discretized approximation q d (θ y) is a finite mixture of Dirac-delta distributions, q d (θ y) = ω k δ θk (θ), ω k = q d (θ k y), ω k = 1 (6) k k S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
28 Proposed Method Within the proposed hybrid family, the optimal variational distribution is q (x, θ y) = argmin KL(q(x, θ y) p(x, θ y)) q(x,θ y) = argmin q(x y,θ),q d (θ y) = argmin q(x y,θ k ),q d (θ k y) q(x y, θ)q d (θ y) ln q(x y, θ)q d(θ y) dxdθ p(x, θ y) q(x y, θ k )q d (θ k y) ln q(x y, θ k)q d (θ k y) dx p(x, θ k y) k We give the name integrated non-factorized variational Bayes (INF-VB) to this method. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
29 Computation Variational Optimization Algorithm Step 1 (Local): For each θ k G, independently solving, q (x y, θ k ) = argmin KL(q(x y, θ k ) p(x y, θ k )) (7) q(x y,θ k ) Step 2 (Global): Given {q (x y, θ k ) : θ k G}, one have ( qd (θ k y) exp q (x y, θ k ) ln p(x, θ ) k y) q (x y, θ k ) dx (8) INF-VB is parallelizable, with dominant computational load distributed on each grid point INF-VB requires no iteration between Step 1 and Step 2 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
30 Our approach unifies INLA under the variational framework. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
31 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G 1. Gaussian approximation q G (x y, θ k ) = N (x; x (θ k ), H(x (θ k )) 1 ), θ k G 2. Hyperparameter learning q LA (θ y) p(x, y, θ) q G (x y, θ) x=x (θ) 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k with area weights k. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
32 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G 1. Gaussian approximation q G (x y, θ k ) = N (x; x (θ k ), H(x (θ k )) 1 ), θ k G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G 2. Hyperparameter learning q LA (θ y) p(x, y, θ) q G (x y, θ) x=x (θ) 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
33 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G 2. Hyperparameter learning q LA (θ y) p(x, y, θ) q G (x y, θ) x=x (θ) (INF-VB) Step 2: ( qd (θ k y) exp q V G(x y, θ k ) ln p(x, θ ) k y) qv G (x y, θ k) dx 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
34 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G (INF-VB) Step 2: Hyperparameter ( learning qd (θ k y) exp qv G(x y, θ k ) ln p(x, θ ) k y) qv G (x y, θ k) dx 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k (INF-VB) Step 3: q(x y) = q(x y, θ)q d (θ y)dθ = k q V G(x y, θ k )q d (θ k y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
35 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G (INF-VB) Step 2: Hyperparameter learning ( qd (θ k y) exp qv G(x y, θ k ) ln p(x, θ ) k y) qv G (x y, θ k) dx (INF-VB) Step 3: Marginal posterior of x q(x y) = q(x y, θ)q d (θ y)dθ = k q V G(x y, θ k )q d (θ k y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
36 Remarks Benefits: Applicable to more general scenarios Optimal variational distributions q(x y, θ k ) and q d (θ y) Negative ELBO provides quantitization of the accuracy Limitations: The dimension of θ has to be no more than 5 or 6 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
37 Application to Bayesian Lasso 1. Non-differentiability of the l 1 norm 2. The Laplace approximation of INLA cannot be applied S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
38 Bayesian Lasso Regression 4 (1/3) Model: y = Φx + e, e N (e; 0, σ 2 I n ) where y R n, Φ R n p, and e R n. We assume x j σ 2, λ 2, λ ( 2 σ exp λ ) x j 1 2 σ 2 σ 2 InvGa(σ 2 ; a, b) λ 2 Ga(λ 2 ; r, s). Problem: Given y and Φ, find posterior distributions for x and θ = {λ 2, σ 2 } 4 Park & Casella, 2008 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
39 Bayesian Lasso Regression (2/3) Inference: 1. Data augmentation Gibbs sampler 2. Mean-Field VB 3. INF-VB INF-VB for Bayesian Lasso (1) q (x y, θ k ): constrain q(x y, θ) = N (x; µ, CC T ), then KL(q(x y, θ) p(x y, θ)) := g(µ, C) (9) is concave in (µ, C) a, D = CC T. (2) q (θ y): can be evaluated analytically (3) q (x y): finite mixture of Gaussians a Challis & Barber, 2011 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
40 Bayesian Lasso Regression (3/3) Denote (µ, D ) = argmin µ,d g(µ, D), the variational Bayesian Lasso, µ ) = argmin g(µ), g(µ) := E N (x;µ,d )( y Φx λσ x 1 (10) µ is a counterpart of Lasso 5, Remarks: ˆx = argmin f(x), f(x) = y Φx λσ x 1 (11) x The conditions of Lasso hold on average Smoothing around origin and thus differentiable Optimize a non-differential function by operating on a sequence of differentiable functions 5 Tibshirani, 1996 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
41 Results (1/4): Diabetes Dataset 6 This benchmark dataset contains Measurements on n = 442 diabetes patients p = 10 clinical covariates (age, sex, body mass index, average blood pressure, and six blood serum measurements) Response variable, a quantitative measure of disease progression Goal: Identify which covariates are important factors Methods: Intensive MCMC runs (ground truth) Mean-Field VB INF-VB-1 INF-VB-2 (INLA, replace LA with VG) Ordinary least square (OLS) 6 Efron et al., 2004 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
42 Results (2/4): Marginal Posteriors q(x j y) q(x 2 y) MCMC INF VB 1 INF VB 2 VB OLS q(x 4 y) MCMC INF VB 1 INF VB 2 VB OLS x (sex) (a) MCMC INF VB 1 INF VB 2 VB OLS x (bp) 4 (b) MCMC INF VB 1 INF VB 2 VB OLS q(x 9 y) 10 q(x 10 y) x (ltg) 9 x (glu) 10 (c) (d) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
43 Results (3/4): Marginal Posteriors q(σ 2 y) and q(λ 2 y) q(σ 2 y) MCMC INF VB 1 INF VB 2 VB OLS q(λ 2 y) 2.5 x MCMC INF VB 1 INF VB 2 VB OLS σ 2 (a) λ 2 (b) Posterior marginals of hyperparameters: (a) q(σ 2 y) and (b) q(λ 2 y) Mean-Field VB could severely underestimate the posterior variance INF-VB-2 offers suboptimal solution S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
44 Results (4/4): Accuracy and Speed Negative ELBO INF VB 1 INF VB 2 VB m (a) Accuracy Elapsed Time (seconds) MCMC INF VB 1 INF VB 2 VB m (b) Time Grid size m m and m = 1, 5, 10, 30, 50. INF-VB with a 1 1 grid: partial Bayesian learning of q(x y, θ) with a fixed θ S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
45 Summary Our method: 1. Tractable family Q: non-factorized 2. Conditional conjugacy: not required 3. Multimodal posterior: could handle 4. Parallelizable: yes More could be done... S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
46 Q&A: Accuracy and Speed Negative ELBO INF VB 1 INF VB 2 INF VB 3 INF VB 4 VB Elapsed Time (seconds) MCMC INF VB 1 INF VB 2 INF VB 3 INF VB 4 VB m (a) m (b) In INF-VB-3 and INF-VB-4 (INLA, replace LA with VG), we obtain a fast VG solution by minimizing a KL divergence upper bound Grid size m m and m = 1, 5, 10, 30, 50. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29
Integrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han Duke University Durham, NC 778 shaobo.han@duke.edu Xuejun Liao Duke University Durham, NC 778 xjliao@duke.edu Lawrence Carin Duke University Durham,
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationEstimating the marginal likelihood with Integrated nested Laplace approximation (INLA)
Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik
More informationBayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang
Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBeyond MCMC in fitting complex Bayesian models: The INLA method
Beyond MCMC in fitting complex Bayesian models: The INLA method Valeska Andreozzi Centre of Statistics and Applications of Lisbon University (valeska.andreozzi at fc.ul.pt) European Congress of Epidemiology
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationDisease mapping with Gaussian processes
EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationNon-Gaussian likelihoods for Gaussian Processes
Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationInference for latent variable models with many hyperparameters
Int. Statistical Inst.: Proc. 5th World Statistical Congress, 011, Dublin (Session CPS070) p.1 Inference for latent variable models with many hyperparameters Yoon, Ji Won Statistics Department Lloyd Building,
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationImproving power posterior estimation of statistical evidence
Improving power posterior estimation of statistical evidence Nial Friel, Merrilee Hurn and Jason Wyse Department of Mathematical Sciences, University of Bath, UK 10 June 2013 Bayesian Model Choice Possible
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationVariational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data
for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationNonlinear Statistical Learning with Truncated Gaussian Graphical Models
Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationSummary STK 4150/9150
STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than
More information