Integrated Non-Factorized Variational Inference

Size: px
Start display at page:

Download "Integrated Non-Factorized Variational Inference"

Transcription

1 Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

2 Overview World Graphical Models Posterior Inference 0.6 MCMC 0.6 VB 0.6 INF VB θ 1 θ 1 θ θ 2 MCMC θ 2 VB θ 2 Our method S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

3 For full posterior inference, our method is A fast deterministic alternative to MCMC More accurate than mean-field variational Bayes (VB) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

4 Outline Introduction Integrated Nested Laplace Approximation (INLA) Integrated Non-Factorized Variational Bayes (INF-VB) Applications Summary S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

5 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: Hyperparameters: x p(x θ) θ p(θ) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

6 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: Hyperparameters: x p(x θ) θ p(θ) Posterior inference: p(x, θ y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

7 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: x p(x θ) Hyperparameters: θ p(θ) Posterior inference: p(x, θ y) p(x y) p(θ y) p(x θ, y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

8 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: x p(x θ) Hyperparameters: θ p(θ) Posterior inference: p(x, θ y) p(x y) p(θ y) + p(x θ, y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

9 Problem of Interest Consider a general Bayesian hierarchical model Observation model: y p(y x, θ) Latent variables: x p(x θ) Hyperparameters: θ p(θ) Posterior inference: p(x, θ y) p(x y) p(θ y) + p(x θ, y) The exact joint posterior p(x, θ y) = p(y, x, θ) p(y) = p(y x, θ)p(x θ)p(θ) p(y x, θ)p(x θ)p(θ)dxdθ can be difficult to evaluate. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

10 Approximate Posterior Inference Sampling based methods: Markov chain Monte Carlo (MCMC) Deterministic alternatives: Laplace approximation (LA) Variational inference Expectation propagation (EP) Integrated nested Laplace approximation (INLA) 1 1 Rue et al., 2009 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

11 Outline Introduction Integrated Nested Laplace Approximation (INLA) Integrated Non-Factorized Variational Bayes (INF-VB) Applications Summary S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

12 INLA in a Nutshell (1/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: θ k G 2 Kass & Steffey, 1989 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

13 INLA in a Nutshell (1/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q G (x y, θ k ) θ k G 2 Kass & Steffey, 1989 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

14 INLA in a Nutshell (1/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q G (x y, θ k ) θ k G 1. Laplace approximation 2 : q G (x y, θ k ) = N (x; x (θ k ), H(x (θ k )) 1 ), θ k G where x (θ k ) = argmax x p(x y, θ k ) is the posterior mode, and H(x (θ k )) is the Hessian matrix of the log posterior evaluated at the mode. 2 Kass & Steffey, 1989 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

15 INLA in a Nutshell (2/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q LA (θ y) q G (x y, θ k ) θ k G 3 Tierney & Kadane, 1986 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

16 INLA in a Nutshell (2/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: According to the Bayes rule, q LA (θ y) q G (x y, θ k ) θ k G p(θ y) = p(x, y, θ), x (1) p(y)p(x y, θ) 3 Tierney & Kadane, 1986 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

17 INLA in a Nutshell (2/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: According to the Bayes rule, q LA (θ y) q G (x y, θ k ) θ k G p(θ y) = p(x, y, θ), x (1) p(y)p(x y, θ) 2. Laplace s method of integration 3 : p(x, y, θ) q LA (θ y) = p(y)q G (x y, θ) x=x (θ) (2) 3 Tierney & Kadane, 1986 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

18 INLA in a Nutshell (3/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q(x y) q LA (θ y) + q G (x y, θ k ) θ k G S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

19 INLA in a Nutshell (3/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q(x y) q LA (θ y) + q G (x y, θ k ) θ k G 3. Numerical integration: q(x y) = k q G (x y, θ k )q LA (θ k y) k with area weights k. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

20 INLA in a Nutshell (3/3) Main idea: Discretizing the low-dimensional space θ using a grid G Demo: q(x, θ y) q(x y) q LA (θ y) + q G (x y, θ k ) θ k G 3. Numerical integration: q(x y) = k q G (x y, θ k )q LA (θ k y) k with area weights k. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

21 INLA in a Nutshell (3/3) Benefits: 1. Preserves full posterior dependencies (i.e. joint density q(x, θ y)) 2. Computationally efficient (MCMC: hours or days, INLA: seconds or minutes) Limitations: 1. Applies only to latent Gaussian models (LGMs) 2. No quantization for the accuracy of approximation q(x, θ y) 3. The dimension of θ has to be no more than 5 or 6 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

22 INLA in a Nutshell (3/3) Benefits: 1. Preserves full posterior dependencies (i.e. joint density q(x, θ y)) 2. Computationally efficient (MCMC: hours or days, INLA: seconds or minutes) Limitations: 1. Applies only to latent Gaussian models (LGMs) 2. No quantization for the accuracy of approximation q(x, θ y) 3. The dimension of θ has to be no more than 5 or 6 Our method addresses the first two limitations with INLA. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

23 Outline Introduction Integrated Nested Laplace Approximation (INLA) Integrated Non-Factorized Variational Bayes (INF-VB) Applications Future Research S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

24 Variational Inference Variational inference turns Bayesian inference into optimization. min KL[q(x, θ y) p(x, θ y)] s.t. q(x, θ y) Q (3) q(x,θ y) Evidence lower bound (ELBO): Applying Jensen s inequality, p(y, x, θ) ln p(y) = ln q(x, θ y) q(x, θ y) dxdθ p(y, x, θ) q(x, θ y)ln dxdθ := L (4) q(x, θ y) The Jensen s gap: ln p(y) L = KL(q(x, θ y) p(x, θ y)) The variational distribution q(x, θ y) is commonly restricted to tractable families Q S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

25 Mean-Field Variational Bayes (VB) Assumes factorized form: q(x, θ y) = q(x)q(θ), then Remarks: q (x, θ y) = argmin KL(q(x, θ y) p(x, θ y)) q(x,θ y) = argmin q(x),q(θ) q(x)q(θ) ln q(x)q(θ) p(x, θ y) dxdθ Easily derived and in close form for conjugate models Challenging for non-conjugate models Ignores posterior dependencies and impairs the accuracy A poor approximation for a multi-modal distribution S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

26 Mean-Field Variational Bayes (VB) Assumes factorized form: q(x, θ y) = q(x)q(θ), then Remarks: q (x, θ y) = argmin KL(q(x, θ y) p(x, θ y)) q(x,θ y) = argmin q(x),q(θ) q(x)q(θ) ln q(x)q(θ) p(x, θ y) dxdθ Easily derived and in close form for conjugate models Challenging for non-conjugate models Ignores posterior dependencies and impairs the accuracy A poor approximation for a multi-modal distribution Our non-factorized variational method addresses these issues with mean-field VB. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

27 Hybrid Continuous-Discrete Family Consider non-factorized form: q(x, θ y) = q(x y, θ)q d (θ y) (5) x and θ are still coupling 1. The continuous approximation q(x y, θ) is very flexible Gaussian Mean-Field 2. The discretized approximation q d (θ y) is a finite mixture of Dirac-delta distributions, q d (θ y) = ω k δ θk (θ), ω k = q d (θ k y), ω k = 1 (6) k k S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

28 Proposed Method Within the proposed hybrid family, the optimal variational distribution is q (x, θ y) = argmin KL(q(x, θ y) p(x, θ y)) q(x,θ y) = argmin q(x y,θ),q d (θ y) = argmin q(x y,θ k ),q d (θ k y) q(x y, θ)q d (θ y) ln q(x y, θ)q d(θ y) dxdθ p(x, θ y) q(x y, θ k )q d (θ k y) ln q(x y, θ k)q d (θ k y) dx p(x, θ k y) k We give the name integrated non-factorized variational Bayes (INF-VB) to this method. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

29 Computation Variational Optimization Algorithm Step 1 (Local): For each θ k G, independently solving, q (x y, θ k ) = argmin KL(q(x y, θ k ) p(x y, θ k )) (7) q(x y,θ k ) Step 2 (Global): Given {q (x y, θ k ) : θ k G}, one have ( qd (θ k y) exp q (x y, θ k ) ln p(x, θ ) k y) q (x y, θ k ) dx (8) INF-VB is parallelizable, with dominant computational load distributed on each grid point INF-VB requires no iteration between Step 1 and Step 2 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

30 Our approach unifies INLA under the variational framework. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

31 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G 1. Gaussian approximation q G (x y, θ k ) = N (x; x (θ k ), H(x (θ k )) 1 ), θ k G 2. Hyperparameter learning q LA (θ y) p(x, y, θ) q G (x y, θ) x=x (θ) 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k with area weights k. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

32 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G 1. Gaussian approximation q G (x y, θ k ) = N (x; x (θ k ), H(x (θ k )) 1 ), θ k G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G 2. Hyperparameter learning q LA (θ y) p(x, y, θ) q G (x y, θ) x=x (θ) 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

33 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G 2. Hyperparameter learning q LA (θ y) p(x, y, θ) q G (x y, θ) x=x (θ) (INF-VB) Step 2: ( qd (θ k y) exp q V G(x y, θ k ) ln p(x, θ ) k y) qv G (x y, θ k) dx 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

34 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G (INF-VB) Step 2: Hyperparameter ( learning qd (θ k y) exp qv G(x y, θ k ) ln p(x, θ ) k y) qv G (x y, θ k) dx 3. Marginal posterior of x q(x y) = k q G (x y, θ k )q LA (θ k y) k (INF-VB) Step 3: q(x y) = q(x y, θ)q d (θ y)dθ = k q V G(x y, θ k )q d (θ k y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

35 INLA v.s. INF-VB Main idea: Discretizing the low-dimensional space θ using a grid G (INF-VB) Step 1: Variational Gaussian approximation q V G (x y, θ k) = argmin KL(q(x y, θ k ) p(x y, θ k )), θ k G (INF-VB) Step 2: Hyperparameter learning ( qd (θ k y) exp qv G(x y, θ k ) ln p(x, θ ) k y) qv G (x y, θ k) dx (INF-VB) Step 3: Marginal posterior of x q(x y) = q(x y, θ)q d (θ y)dθ = k q V G(x y, θ k )q d (θ k y) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

36 Remarks Benefits: Applicable to more general scenarios Optimal variational distributions q(x y, θ k ) and q d (θ y) Negative ELBO provides quantitization of the accuracy Limitations: The dimension of θ has to be no more than 5 or 6 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

37 Application to Bayesian Lasso 1. Non-differentiability of the l 1 norm 2. The Laplace approximation of INLA cannot be applied S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

38 Bayesian Lasso Regression 4 (1/3) Model: y = Φx + e, e N (e; 0, σ 2 I n ) where y R n, Φ R n p, and e R n. We assume x j σ 2, λ 2, λ ( 2 σ exp λ ) x j 1 2 σ 2 σ 2 InvGa(σ 2 ; a, b) λ 2 Ga(λ 2 ; r, s). Problem: Given y and Φ, find posterior distributions for x and θ = {λ 2, σ 2 } 4 Park & Casella, 2008 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

39 Bayesian Lasso Regression (2/3) Inference: 1. Data augmentation Gibbs sampler 2. Mean-Field VB 3. INF-VB INF-VB for Bayesian Lasso (1) q (x y, θ k ): constrain q(x y, θ) = N (x; µ, CC T ), then KL(q(x y, θ) p(x y, θ)) := g(µ, C) (9) is concave in (µ, C) a, D = CC T. (2) q (θ y): can be evaluated analytically (3) q (x y): finite mixture of Gaussians a Challis & Barber, 2011 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

40 Bayesian Lasso Regression (3/3) Denote (µ, D ) = argmin µ,d g(µ, D), the variational Bayesian Lasso, µ ) = argmin g(µ), g(µ) := E N (x;µ,d )( y Φx λσ x 1 (10) µ is a counterpart of Lasso 5, Remarks: ˆx = argmin f(x), f(x) = y Φx λσ x 1 (11) x The conditions of Lasso hold on average Smoothing around origin and thus differentiable Optimize a non-differential function by operating on a sequence of differentiable functions 5 Tibshirani, 1996 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

41 Results (1/4): Diabetes Dataset 6 This benchmark dataset contains Measurements on n = 442 diabetes patients p = 10 clinical covariates (age, sex, body mass index, average blood pressure, and six blood serum measurements) Response variable, a quantitative measure of disease progression Goal: Identify which covariates are important factors Methods: Intensive MCMC runs (ground truth) Mean-Field VB INF-VB-1 INF-VB-2 (INLA, replace LA with VG) Ordinary least square (OLS) 6 Efron et al., 2004 S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

42 Results (2/4): Marginal Posteriors q(x j y) q(x 2 y) MCMC INF VB 1 INF VB 2 VB OLS q(x 4 y) MCMC INF VB 1 INF VB 2 VB OLS x (sex) (a) MCMC INF VB 1 INF VB 2 VB OLS x (bp) 4 (b) MCMC INF VB 1 INF VB 2 VB OLS q(x 9 y) 10 q(x 10 y) x (ltg) 9 x (glu) 10 (c) (d) S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

43 Results (3/4): Marginal Posteriors q(σ 2 y) and q(λ 2 y) q(σ 2 y) MCMC INF VB 1 INF VB 2 VB OLS q(λ 2 y) 2.5 x MCMC INF VB 1 INF VB 2 VB OLS σ 2 (a) λ 2 (b) Posterior marginals of hyperparameters: (a) q(σ 2 y) and (b) q(λ 2 y) Mean-Field VB could severely underestimate the posterior variance INF-VB-2 offers suboptimal solution S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

44 Results (4/4): Accuracy and Speed Negative ELBO INF VB 1 INF VB 2 VB m (a) Accuracy Elapsed Time (seconds) MCMC INF VB 1 INF VB 2 VB m (b) Time Grid size m m and m = 1, 5, 10, 30, 50. INF-VB with a 1 1 grid: partial Bayesian learning of q(x y, θ) with a fixed θ S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

45 Summary Our method: 1. Tractable family Q: non-factorized 2. Conditional conjugacy: not required 3. Multimodal posterior: could handle 4. Parallelizable: yes More could be done... S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

46 Q&A: Accuracy and Speed Negative ELBO INF VB 1 INF VB 2 INF VB 3 INF VB 4 VB Elapsed Time (seconds) MCMC INF VB 1 INF VB 2 INF VB 3 INF VB 4 VB m (a) m (b) In INF-VB-3 and INF-VB-4 (INLA, replace LA with VG), we obtain a fast VG solution by minimizing a KL divergence upper bound Grid size m m and m = 1, 5, 10, 30, 50. S. Han et al. Integrated Non-Factorized Variational Inference February 27, / 29

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han Duke University Durham, NC 778 shaobo.han@duke.edu Xuejun Liao Duke University Durham, NC 778 xjliao@duke.edu Lawrence Carin Duke University Durham,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik

More information

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009 with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Beyond MCMC in fitting complex Bayesian models: The INLA method

Beyond MCMC in fitting complex Bayesian models: The INLA method Beyond MCMC in fitting complex Bayesian models: The INLA method Valeska Andreozzi Centre of Statistics and Applications of Lisbon University (valeska.andreozzi at fc.ul.pt) European Congress of Epidemiology

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Variational Learning : From exponential families to multilinear systems

Variational Learning : From exponential families to multilinear systems Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Non-Gaussian likelihoods for Gaussian Processes

Non-Gaussian likelihoods for Gaussian Processes Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Inference for latent variable models with many hyperparameters

Inference for latent variable models with many hyperparameters Int. Statistical Inst.: Proc. 5th World Statistical Congress, 011, Dublin (Session CPS070) p.1 Inference for latent variable models with many hyperparameters Yoon, Ji Won Statistics Department Lloyd Building,

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Improving power posterior estimation of statistical evidence

Improving power posterior estimation of statistical evidence Improving power posterior estimation of statistical evidence Nial Friel, Merrilee Hurn and Jason Wyse Department of Mathematical Sciences, University of Bath, UK 10 June 2013 Bayesian Model Choice Possible

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Nonlinear Statistical Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Changyou Chen, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Presented

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Summary STK 4150/9150

Summary STK 4150/9150 STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than

More information