Web-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details
|
|
- Barbra Wood
- 5 years ago
- Views:
Transcription
1 Web-Supplement for: Accurate Logistic Variational Message Passing: Algebraic and Numerical Details BY TUI H. NOLAN AND MATT P. WAND School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway 2007, Australia S. Proof of Theorem First note that for any a, b R, ( a Φ(a + bxφ(x dx = Φ b 2 + ( b and x Φ(a + bxφ(x dx = b 2 + φ a. b 2 + (S. From the first result in (S., it follows immediately that k expit k (µ + σ xφ(x dx = p k,i Φ. i= + σ 2 s 2 k,i Hence, for all µ R and σ > 0, k B 0(µ, σ 2 p k,i Φ i= + σ 2 s 2 k,i = expit(µ + σ xφ(x dx expit k (µ + σ xφ(x dx expit(µ + σ x expitk (µ + σ x φ(x dx sup expit(u expit k (u u R φ(x dx = k the last equality follows from (7. Part (a of Theorem follows immediately. The second result in (S. implies that k p k,i s k,i x expit k (µ + σ xφ(x dx = σ φ i= + σ 2 s 2 k,i + σ 2 s 2 k,i
2 For all µ R and σ > 0 we then have k B (µ, σ 2 p k,i s k,i σ φ i= + σ 2 s 2 k,i + σ 2 s 2 k,i = expit(µ + σ x xφ(x dx expit k (µ + σ x xφ(x dx expit(µ + σ x expitk (µ + σ x x φ(x dx sup expit(u expit k (u u R = 2 k x φ(x dx = 2/π k. S.2 Derivation of Algorithm The message passed from p(y θ to θ is 0 x φ(x dx m p(y θ θ (θ = expy T Aθ T log + exp(aθ} is not conjugate with Multivariate Normal messages passed to θ from other factors. A nonconjugate VMP remedy (Knowles & Minka, 20 involves replacement of m p(y θ θ (θ by θ m p(y θ θ (θ exp vec(θθ T T η p(y θ θ to enforce conjugacy with Multivariate Normal messages. Under pre-specification (S.2 the current q(θ density function satisfies q(θ m p(y θ θ (θ (product of messages passed to θ from its other neighbours (S.2 From (0 of Wand (207, we then get q(θ m p(y θ θ (θm θ p(y θ (θ = exp θ vec(θθ T T η p(y θ θ η p(y θ θ η p(y θ θ + η θ p(y θ. Let µ p(y θ θ and Σ p(y θ θ be the corresponding common parameters. The natural parameters and common parameters are the following functions of each other: η p(y θ θ = and (ηp(y θ θ (η p(y θ θ 2 = Σ p(y θ θ µ p(y θ θ 2 vec(σ p(y θ θ µp(y θ θ = 2 vec ( (η p(y θ θ 2 } (η p(y θ θ Σ p(y θ θ = 2 vec ( (η p(y θ θ 2 }. In the upcoming arguments we will use the shorthand: η q(θ η p(y θ θ, µ q(θ µ p(y θ θ and Σ q(θ Σ p(y θ θ. 2 (S.3
3 Using the set-up of Section 2.2 of Rohde & Wand (206, the θ-localized approximate marginal log-likelihood is log p(y; q, η q(θ θ = Entropyq(θ; η q(θ } + NonEntropyq(θ; η q(θ } Entropyq(θ; η q(θ } = 2 log 2 vec (η q(θ 2 } + d + log(2π}/2, with d denoting the dimension of θ. Also, NonEntropyq(θ; η q(θ } E q(θ;ηq(θ log p(y θ} +E q(θ;ηq(θ (sum of other log-factors neighboring θ T = y T Aµ q(θ E q(θ;ηq(θ T θ log + exp(aθ} + vec(θθ T η η is the sum of the natural parameters of the messages passed to θ other than the message from p(y θ. Ideally we would maximize log p(y; q, η q(θ θ over η q(θ but are thwarted by the intractability of the q-density expectation of log + exp(aθ}. To get around this we apply ( to obtain log p(y; q, η q(θ θ Entropyq(θ; η q(θ } + NonEntropyη q(θ, ω } NonEntropyη q(θ, ω } y T Aµ q(θ 2 (ω2 T diagonal(aσ q(θ A T T log + expaµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T } T µ q(θ + ( } vec Σ q(θ + µ q(θ µ T η q(θ and ω is an n vector of variational parameters. We now seek to maximize log p(y; q, η q(θ, ω θ Entropyq(θ; η q(θ } + NonEntropyη q(θ, ω }. Using arguments analogous to those given in Section 4 of Rohde & Wand (206, the function log p(y; q, η q(θ, ω θ has a stationary point in the η T q(θ ωt T space if and only if η q(θ = H ηq(θ A(η q(θ } Dηq(θ NonEntropyη q(θ, ω } T and (S.4 0 = D ω NonEntropyη q(θ, ω } T (S.5 with A denoting the Multivariate Normal log-partition function and D ηq(θ and H ηq(θ respectively denoting the derivative vector and Hessian matrix with respect to η q(θ as defined in Rohde & Wand (206. Standard vector calculus steps (e.g. Wand, 2002 lead to D ω NonEntropyη q(θ, ω } T = expitaµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T } ω diagonal(aσ q(θ A T. Substitution of this result into (S.5 and a reworking of the arguments that lead to Result 2 of Rohde & Wand (206, but applied to NonEntropyη q(θ, ω }, rather than NonEntropyq(θ; η q(θ }, lead to the fixed-point updates: ω expitaµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T } v q(θ Σ q(θ D µq(θ NonEntropyη q(θ, ω } T H µq(θ NonEntropyη q(θ, ω } µ q(θ µ q(θ + Σ q(θ v q(θ (S.6 3
4 Further vector calculus leads to the explicit forms: D µq(θ NonEntropyη q(θ, ω } T = A T ( y expit Aµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T and +(η + 2 vec ( (η 2 µq(θ H µq(θ NonEntropyη q(θ, ω } T = ( A T 2 diag + cosh Aµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T A + 2 vec ( (η 2. Introduction of ω 0 logit(ω and substitution into (S.6 then gives ω 0 Aµ q(θ + 2 ( 2ω diagonal(aσ q(θ A T ω expit(ω 0 ; ω 2 /2 + cosh(ω 0 } Σ q(θ } A T diag(ω 2 A 2vec((η 2 µ q(θ µ q(θ + Σ q(θ A T (y ω + (η + 2 vec ( (η 2 µq(θ }. (S.7 The remainder of the derivation of Algorithm involves expressing (S.7 in terms of the input and output natural parameter vectors η p(y θ θ and η θ p(y θ. Using (S.3, the ω 0 update is equivalent to µ 2 vec A ( } (η p(y θ θ 2 (ηp(y θ θ σ 2 2 A vec diagonal ( } (η p(y θ θ 2 A T ω 0 µ + 2 ( 2ω σ 2. The Σ q(θ update can be written as which is equivalent to 2 vec ( (η p(y θ θ 2 } A T diag(ω 2 A 2 vec ( (η } 2 (η p(y θ θ 2 + (η 2 2 vec(at diag(ω 2 A + (η 2 which, in turn, is equivalent to the second component of η p(y θ θ being updated according to (η p(y θ θ 2 2 vec(at diag(ω 2 A. (S.8 For the update of the first component of η p(y θ θ we note that the last update of (S.7 is equivalent to Σ q(θ µ q(θ Σ q(θ µ q(θ + A T (y ω + (η + 2 vec ( (η 2 µq(θ, on the right-hand side, Σ q(θ = AT diag(ω 2 A 2 vec ( (η 2 (S.9 (S.0 4
5 according to its updated value and µ q(θ = 2 vec ( (η p(y θ θ 2 } (η p(y θ θ (S. is the terms of the natural parameter from the previous iteration before (S.8 has taken place. Substitution of (S.0 and (S. into (S.9 we get (η p(y θ θ + (η 2 AT diag(ω 2 A + vec ( (η 2 } vec ( (η p(y θ θ 2 } (η p(y θ θ +A T (y ω + (η vec ( (η 2 vec ( (η p(y θ θ 2 } (η p(y θ θ which is equivalent to (η p(y θ θ 2 AT diag(ω 2 Avec ( (η p(y θ θ 2 } (η p(y θ θ + A T (y ω = A T (ω 2 µ + A T (y ω = A T (y ω + ω 2 µ. Combining this update with that given in (S.8 we get the following update for the full natural parameter vector: ( η p(y θ θ AT y ω + ω 2 µ 2 vec( A T diag(ω 2 A which matches Algorithm. S.3 Approximation of corr(β 0, β y Consider the Bayesian linear regression model (8. Then, given the approximate noninformativity of prior distribution of β = β 0 β, the posterior covariance matrix of β, Cov(β y is such that Cov(β y the inverse Fisher information matrix of β = X T diagb (Xβ}X ( = X T diag X 2 + cosh(xβ} X x.. x n. Straightforward algebra then leads to the approximate posterior correlation between β 0 and β being corr(β 0, β y n n i= x i/ + cosh(β 0 + β x i n n i= / + cosh(β 0 + β x i }. n n i= x2 i / + cosh(β 0 + β x i } However, the x i s are uniformly distributed on (0, so replacement of sample means by population means leads to the final approximation corr(β 0, β y 0 x/ + cosh(β 0 + β x} dx 0 / + cosh(β 0 + β x} dx. 0 x2 / + cosh(β 0 + β x} dx 5
6 S.4 Approximate Marginal Log-Likelihood Expressions The simulation study described in Section 4 concerning variational inference for the Bayesian logistic regression model (2 used approximate marginal log-likelihood expressions, appropriate for the particular approach, as a means to assess convergence. Each of the expressions are given in this section. The last one uses the definition, as defined in Section 3.2, B(µ, σ 2 b(µ + σ xφ(x dx b(x log( + e x. As with the B r notation given there, evaluations of B(µ, σ 2 when µ and σ 2 are equal-sized column vectors are defined in an element-wise fashion, as illustrated by ( 9 36 B(9, 36 B,. 28 B(, 28 S.4. Jaakkola-Jordan Updates Σ q(β;ξ log p(y; q, ξ = 2 log Σ q(β;ξ + 2 µt q(β;ξ Σ q(β;ξ µ q(β;ξ 2 µt β Σ β µ β X T diag tanh(ξ/2 2ξ + n ξ/2 log( + e ξ i + (ξ/4 tanh(ξ i /2} 2 log Σ β i= } X + Σ β and µ q(β;ξ Σ q(β;ξx T (y 2 +Σ β µ β}. and ξ is the current value of the variational parameter vector that arises in the Jaakkola-Jordan device. See, for example, Section 5. of Wand (207. S.4.2 Saul-Jordan Updates log p(y; q, ω = 2 log Σ q(β 2 tr Σ β Σq(β + (µ q(β µ β (µ q(β µ β T } +y T Xµ q(β 2 (ω2 T diagonal(xσ q(β X T T log + expxµ q(β + 2 ( 2ω diagonal(xσ q(βx T } + d 2 2 log Σ β ω is the current value of the variational parameter vector that arises in the Saul-Jordan device. S.4.3 Knowles-Minka-Wand Updates log p(y; q = 2 log Σ q(β 2 tr Σ β Σq(β + (µ q(β µ β (µ q(β µ β T } +y T Xµ q(β T B ( Xµ q(β, diagonal(xσ q(β X T + d 2 2 log Σ β 6
7 Additional References Rohde, D. and Wand, M.P. (206. Semiparametric mean field variational Bayes: General principles and numerical issues. Journal of Machine Learning Research, 7(72, 47. Wand, M.P. (2002. Vector differential calculus in statistics. The American Statistician, 56,
LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationNon-conjugate Variational Message Passing for Multinomial and Binary Regression
Non-conjugate Variational Message Passing for Multinomial and Binary Regression David A. Knowles Department of Engineering University of Cambridge Thomas P. Minka Microsoft Research Cambridge, UK Abstract
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.
ECE521 Tutorial 2 Regression, GPs, Assignment 1 ECE521 Winter 2016 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3 Outline
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationAppendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM
More informationDEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers.
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationPiecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models
Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Benjamin M. Marlin Mohammad Emtiyaz Khan Kevin P. Murphy University of British Columbia, Vancouver, BC, Canada V6T Z4 bmarlin@cs.ubc.ca
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationVariational Bayes and Variational Message Passing
Variational Bayes and Variational Message Passing Mohammad Emtiyaz Khan CS,UBC Variational Bayes and Variational Message Passing p.1/16 Variational Inference Find a tractable distribution Q(H) that closely
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationLearning Multiple Related Tasks using Latent Independent Component Analysis
Learning Multiple Related Tasks using Latent Independent Component Analysis Jian Zhang, Zoubin Ghahramani, Yiming Yang School of Computer Science Gatsby Computational Neuroscience Unit Cargenie Mellon
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationFisher information for generalised linear mixed models
Journal of Multivariate Analysis 98 2007 1412 1416 www.elsevier.com/locate/jmva Fisher information for generalised linear mixed models M.P. Wand Department of Statistics, School of Mathematics and Statistics,
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationCS281A/Stat241A Lecture 17
CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More information