Web-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details

Size: px
Start display at page:

Download "Web-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details"

Transcription

1 Web-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details BY TUI H. NOLAN AND MATT P. WAND School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway 2007, Australia S. Proof of Theorem First note that for any a, b R, ( a Φ(a + bxφ(x dx = Φ b 2 + ( b and x Φ(a + bxφ(x dx = b 2 + φ a. b 2 + (S. From the first result in (S., it follows immediately that k expit k (µ + σ xφ(x dx = p k,i Φ. i= + σ 2 s 2 k,i Hence, for all µ R and σ > 0, k B 0(µ, σ 2 p k,i Φ i= + σ 2 s 2 k,i = expit(µ + σ xφ(x dx expit k (µ + σ xφ(x dx expit(µ + σ x expitk (µ + σ x φ(x dx sup expit(u expit k (u u R φ(x dx = k the last equality follows from (7. Part (a of Theorem follows immediately. The second result in (S. implies that k p k,i s k,i x expit k (µ + σ xφ(x dx = σ φ i= + σ 2 s 2 k,i + σ 2 s 2 k,i

2 For all µ R and σ > 0 we then have k B (µ, σ 2 p k,i s k,i σ φ i= + σ 2 s 2 k,i + σ 2 s 2 k,i = expit(µ + σ x xφ(x dx expit k (µ + σ x xφ(x dx expit(µ + σ x expitk (µ + σ x x φ(x dx sup expit(u expit k (u u R = 2 k x φ(x dx = 2/π k. S.2 Derivation of Algorithm The message passed from p(y θ to θ is 0 x φ(x dx m p(y θ θ (θ = expy T Aθ T log + exp(aθ} is not conjugate with Multivariate Normal messages passed to θ from other factors. A nonconjugate VMP remedy (Knowles & Minka, 20 involves replacement of m p(y θ θ (θ by θ m p(y θ θ (θ exp vec(θθ T T η p(y θ θ to enforce conjugacy with Multivariate Normal messages. Under pre-specification (S.2 the current q(θ density function satisfies q(θ m p(y θ θ (θ (product of messages passed to θ from its other neighbours (S.2 From (0 of Wand (207, we then get q(θ m p(y θ θ (θm θ p(y θ (θ = exp θ vec(θθ T T η p(y θ θ η p(y θ θ η p(y θ θ + η θ p(y θ. Let µ p(y θ θ and Σ p(y θ θ be the corresponding common parameters. The natural parameters and common parameters are the following functions of each other: η p(y θ θ = and (ηp(y θ θ (η p(y θ θ 2 = Σ p(y θ θ µ p(y θ θ 2 vec(σ p(y θ θ µp(y θ θ = 2 vec ( (η p(y θ θ 2 } (η p(y θ θ Σ p(y θ θ = 2 vec ( (η p(y θ θ 2 }. In the upcoming arguments we will use the shorthand: η q(θ η p(y θ θ, µ q(θ µ p(y θ θ and Σ q(θ Σ p(y θ θ. 2 (S.3

3 Using the set-up of Section 2.2 of Rohde & Wand (206, the θ-localized approximate marginal log-likelihood is log p(y; q, η q(θ θ = Entropyq(θ; η q(θ } + NonEntropyq(θ; η q(θ } Entropyq(θ; η q(θ } = 2 log 2 vec (η q(θ 2 } + d + log(2π}/2, with d denoting the dimension of θ. Also, NonEntropyq(θ; η q(θ } E q(θ;ηq(θ log p(y θ} +E q(θ;ηq(θ (sum of other log-factors neighboring θ T = y T Aµ q(θ E q(θ;ηq(θ T θ log + exp(aθ} + vec(θθ T η η is the sum of the natural parameters of the messages passed to θ other than the message from p(y θ. Ideally we would maximize log p(y; q, η q(θ θ over η q(θ but are thwarted by the intractability of the q-density expectation of log + exp(aθ}. To get around this we apply ( to obtain log p(y; q, η q(θ θ Entropyq(θ; η q(θ } + NonEntropyη q(θ, ω } NonEntropyη q(θ, ω } y T Aµ q(θ 2 (ω2 T diagonal(aσ q(θ A T T log + expaµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T } T µ q(θ + ( } vec Σ q(θ + µ q(θ µ T η q(θ and ω is an n vector of variational parameters. We now seek to maximize log p(y; q, η q(θ, ω θ Entropyq(θ; η q(θ } + NonEntropyη q(θ, ω }. Using arguments analogous to those given in Section 4 of Rohde & Wand (206, the function log p(y; q, η q(θ, ω θ has a stationary point in the η T q(θ ωt T space if and only if η q(θ = H ηq(θ A(η q(θ } Dηq(θ NonEntropyη q(θ, ω } T and (S.4 0 = D ω NonEntropyη q(θ, ω } T (S.5 with A denoting the Multivariate Normal log-partition function and D ηq(θ and H ηq(θ respectively denoting the derivative vector and Hessian matrix with respect to η q(θ as defined in Rohde & Wand (206. Standard vector calculus steps (e.g. Wand, 2002 lead to D ω NonEntropyη q(θ, ω } T = expitaµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T } ω diagonal(aσ q(θ A T. Substitution of this result into (S.5 and a reworking of the arguments that lead to Result 2 of Rohde & Wand (206, but applied to NonEntropyη q(θ, ω }, rather than NonEntropyq(θ; η q(θ }, lead to the fixed-point updates: ω expitaµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T } v q(θ Σ q(θ D µq(θ NonEntropyη q(θ, ω } T H µq(θ NonEntropyη q(θ, ω } µ q(θ µ q(θ + Σ q(θ v q(θ (S.6 3

4 Further vector calculus leads to the explicit forms: D µq(θ NonEntropyη q(θ, ω } T = A T ( y expit Aµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T and +(η + 2 vec ( (η 2 µq(θ H µq(θ NonEntropyη q(θ, ω } T = ( A T 2 diag + cosh Aµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T A + 2 vec ( (η 2. Introduction of ω 0 logit(ω and substitution into (S.6 then gives ω 0 Aµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T ω expit(ω 0 ; ω 2 /2 + cosh(ω 0 } Σ q(θ } A T diag(ω 2 A 2vec((η 2 µ q(θ µ q(θ + Σ q(θ A T (y ω + (η + 2 vec ( (η 2 µq(θ }. (S.7 The remainder of the derivation of Algorithm involves expressing (S.7 in terms of the input and output natural parameter vectors η p(y θ θ and η θ p(y θ. Using (S.3, the ω 0 update is equivalent to µ 2 vec A ( } (η p(y θ θ 2 (ηp(y θ θ σ 2 2 A vec diagonal ( } (η p(y θ θ 2 A T ω 0 µ + 2 ( 2ω σ 2. The Σ q(θ update can be written as which is equivalent to 2 vec ( (η p(y θ θ 2 } A T diag(ω 2 A 2 vec ( (η } 2 (η p(y θ θ 2 + (η 2 2 vec(at diag(ω 2 A + (η 2 which, in turn, is equivalent to the second component of η p(y θ θ being updated according to (η p(y θ θ 2 2 vec(at diag(ω 2 A. (S.8 For the update of the first component of η p(y θ θ we note that the last update of (S.7 is equivalent to Σ q(θ µ q(θ Σ q(θ µ q(θ + A T (y ω + (η + 2 vec ( (η 2 µq(θ, on the right-hand side, Σ q(θ = AT diag(ω 2 A 2 vec ( (η 2 (S.9 (S.0 4

5 according to its updated value and µ q(θ = 2 vec ( (η p(y θ θ 2 } (η p(y θ θ (S. is the terms of the natural parameter from the previous iteration before (S.8 has taken place. Substitution of (S.0 and (S. into (S.9 we get (η p(y θ θ + (η 2 AT diag(ω 2 A + vec ( (η 2 } vec ( (η p(y θ θ 2 } (η p(y θ θ +A T (y ω + (η vec ( (η 2 vec ( (η p(y θ θ 2 } (η p(y θ θ which is equivalent to (η p(y θ θ 2 AT diag(ω 2 Avec ( (η p(y θ θ 2 } (η p(y θ θ + A T (y ω = A T (ω 2 µ + A T (y ω = A T (y ω + ω 2 µ. Combining this update with that given in (S.8 we get the following update for the full natural parameter vector: ( η p(y θ θ AT y ω + ω 2 µ 2 vec( A T diag(ω 2 A which matches Algorithm. S.3 Approximation of corr(β 0, β y Consider the Bayesian linear regression model (8. Then, given the approximate noninformativity of prior distribution of β = β 0 β, the posterior covariance matrix of β, Cov(β y is such that Cov(β y the inverse Fisher information matrix of β = X T diagb (Xβ}X ( = X T diag X 2 + cosh(xβ} X x.. x n. Straightforward algebra then leads to the approximate posterior correlation between β 0 and β being corr(β 0, β y n n i= x i/ + cosh(β 0 + β x i n n i= / + cosh(β 0 + β x i }. n n i= x2 i / + cosh(β 0 + β x i } However, the x i s are uniformly distributed on (0, so replacement of sample means by population means leads to the final approximation corr(β 0, β y 0 x/ + cosh(β 0 + β x} dx 0 / + cosh(β 0 + β x} dx. 0 x2 / + cosh(β 0 + β x} dx 5

6 S.4 Approximate Marginal Log-Likelihood Expressions The simulation study described in Section 4 concerning variational inference for the Bayesian logistic regression model (2 used approximate marginal log-likelihood expressions, appropriate for the particular approach, as a means to assess convergence. Each of the expressions are given in this section. The last one uses the definition, as defined in Section 3.2, B(µ, σ 2 b(µ + σ xφ(x dx b(x log( + e x. As with the B r notation given there, evaluations of B(µ, σ 2 when µ and σ 2 are equal-sized column vectors are defined in an element-wise fashion, as illustrated by ( 9 36 B(9, 36 B,. 28 B(, 28 S.4. Jaakkola-Jordan Updates Σ q(β;ξ log p(y; q, ξ = 2 log Σ q(β;ξ + 2 µt q(β;ξ Σ q(β;ξ µ q(β;ξ 2 µt β Σ β µ β X T diag tanh(ξ/2 2ξ + n ξ/2 log( + e ξ i + (ξ/4 tanh(ξ i /2} 2 log Σ β i= } X + Σ β and µ q(β;ξ Σ q(β;ξx T (y 2 +Σ β µ β}. and ξ is the current value of the variational parameter vector that arises in the Jaakkola-Jordan device. See, for example, Section 5. of Wand (207. S.4.2 Saul-Jordan Updates log p(y; q, ω = 2 log Σ q(β 2 tr Σ β Σq(β + (µ q(β µ β (µ q(β µ β T } +y T Xµ q(β 2 (ω2 T diagonal(xσ q(β X T T log + expxµ q(β + 2 ( 2ω diagonal(xσ q(βx T } + d 2 2 log Σ β ω is the current value of the variational parameter vector that arises in the Saul-Jordan device. S.4.3 Knowles-Minka-Wand Updates log p(y; q = 2 log Σ q(β 2 tr Σ β Σq(β + (µ q(β µ β (µ q(β µ β T } +y T Xµ q(β T B ( Xµ q(β, diagonal(xσ q(β X T + d 2 2 log Σ β 6

7 Additional References Rohde, D. and Wand, M.P. (206. Semiparametric mean field variational Bayes: General principles and numerical issues. Journal of Machine Learning Research, 7(72, 47. Wand, M.P. (2002. Vector differential calculus in statistics. The American Statistician, 56,

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Non-conjugate Variational Message Passing for Multinomial and Binary Regression

Non-conjugate Variational Message Passing for Multinomial and Binary Regression Non-conjugate Variational Message Passing for Multinomial and Binary Regression David A. Knowles Department of Engineering University of Cambridge Thomas P. Minka Microsoft Research Cambridge, UK Abstract

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.

ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 Regression, GPs, Assignment 1 ECE521 Winter 2016 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3 Outline

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Bayesian Logistic Regression

Bayesian Logistic Regression Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM

More information

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models

Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Benjamin M. Marlin Mohammad Emtiyaz Khan Kevin P. Murphy University of British Columbia, Vancouver, BC, Canada V6T Z4 bmarlin@cs.ubc.ca

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Variational Bayes and Variational Message Passing

Variational Bayes and Variational Message Passing Variational Bayes and Variational Message Passing Mohammad Emtiyaz Khan CS,UBC Variational Bayes and Variational Message Passing p.1/16 Variational Inference Find a tractable distribution Q(H) that closely

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Learning Multiple Related Tasks using Latent Independent Component Analysis

Learning Multiple Related Tasks using Latent Independent Component Analysis Learning Multiple Related Tasks using Latent Independent Component Analysis Jian Zhang, Zoubin Ghahramani, Yiming Yang School of Computer Science Gatsby Computational Neuroscience Unit Cargenie Mellon

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Fisher information for generalised linear mixed models

Fisher information for generalised linear mixed models Journal of Multivariate Analysis 98 2007 1412 1416 www.elsevier.com/locate/jmva Fisher information for generalised linear mixed models M.P. Wand Department of Statistics, School of Mathematics and Statistics,

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information