Bayesian Inference by Density Ratio Estimation

Size: px
Start display at page:

Download "Bayesian Inference by Density Ratio Estimation"

Transcription

1 Bayesian Inference by Density Ratio Estimation Michael Gutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 20th June 2017

2 Task Perform Bayesian inference for models where 1. the likelihood function is too costly to compute 2. sampling simulating data from the model is possible Michael Gutmann Bayesian Inference by Density Ratio Estimation 2 / 40

3 Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 3 / 40

4 Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 4 / 40

5 Simulator-based models Goal: Inference for models that are specified by a mechanism for generating data e.g. stochastic dynamical systems e.g. computer models / simulators of some complex physical or biological process Such models occur in multiple and diverse scientific fields. Different communities use different names: Simulator-based models Stochastic simulation models Implicit models Generative (latent-variable) models Probabilistic programs Michael Gutmann Bayesian Inference by Density Ratio Estimation 5 / 40

6 Strain Examples Simulator-based models are widely used: Evolutionary biology: Simulating evolution Neuroscience: Simulating neural circuits Ecology: Simulating species migration Health science: Simulating the spread of an infectious disease 5 Strain Individual 30 Individual Time Sampled individuals Michael Gutmann Bayesian Inference by Density Ratio Estimation 6 / 40

7 Definition of simulator-based models Let (Ω, F, P) be a probability space. A simulator-based model is a collection of (measurable) functions g(., θ) parametrised by θ, ω Ω x θ = g(ω, θ) X (1) For any fixed θ, x θ = g(., θ) is a random variable. Simulation / Sampling Michael Gutmann Bayesian Inference by Density Ratio Estimation 7 / 40

8 Implicit definition of the model pdfs A A Michael Gutmann Bayesian Inference by Density Ratio Estimation 8 / 40

9 Advantages of simulator-based models Direct implementation of hypotheses of how the observed data were generated. Neat interface with scientific models (e.g. from physics or biology). Modelling by replicating the mechanisms of nature that produced the observed/measured data. ( Analysis by synthesis ) Possibility to perform experiments in silico. Michael Gutmann Bayesian Inference by Density Ratio Estimation 9 / 40

10 Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult. Michael Gutmann Bayesian Inference by Density Ratio Estimation 10 / 40

11 Disadvantages of simulator-based models Generally elude analytical treatment. Can be easily made more complicated than necessary. Statistical inference is difficult. Main reason: Likelihood function is intractable Michael Gutmann Bayesian Inference by Density Ratio Estimation 10 / 40

12 The likelihood function L(θ) Probability that the model generates data like x o when using parameter value θ Generally well defined but intractable for simulator-based / implicit models Data source Observation Data space Unknown properties M(θ) Model Data generation Michael Gutmann Bayesian Inference by Density Ratio Estimation 11 / 40

13 Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 12 / 40

14 Three foundational issues 1. How should we assess whether x θ x o? 2. How should we compute the probability of the event x θ x o? 3. For which values of θ should we compute it? Data source Observation Data space Unknown properties M(θ) Model Data generation Likelihood: Probability that the model generates data like x o for parameter value θ Michael Gutmann Bayesian Inference by Density Ratio Estimation 13 / 40

15 Approximate Bayesian computation 1. How should we assess whether x θ x o? Check whether T (x θ ) T (x o ) ɛ 2. How should we compute the proba of the event x θ x o? By counting 3. For which values of θ should we compute it? Sample from the prior (or other proposal distributions) Michael Gutmann Bayesian Inference by Density Ratio Estimation 14 / 40

16 Approximate Bayesian computation 1. How should we assess whether x θ x o? Check whether T (x θ ) T (x o ) ɛ 2. How should we compute the proba of the event x θ x o? By counting 3. For which values of θ should we compute it? Sample from the prior (or other proposal distributions) Difficulties: Choice of T () and ɛ Typically high computational cost For recent review, see: Lintusaari et al (2017) Fundamentals and recent developments in approximate Bayesian computation, Systematic Biology Michael Gutmann Bayesian Inference by Density Ratio Estimation 14 / 40

17 Synthetic likelihood (Simon Wood, Nature, 2010) 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Compute summary statistics t θ = T (x θ ) Model their distribution as a Gaussian Compute likelihood function with T (x o ) as observed data 3. For which values of θ should we compute it? Use obtained synthetic likelihood function as part of a Monte Carlo method Michael Gutmann Bayesian Inference by Density Ratio Estimation 15 / 40

18 Synthetic likelihood (Simon Wood, Nature, 2010) 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Compute summary statistics t θ = T (x θ ) Model their distribution as a Gaussian Compute likelihood function with T (x o ) as observed data 3. For which values of θ should we compute it? Use obtained synthetic likelihood function as part of a Monte Carlo method Difficulties: Choice of T () Gaussianity assumption may not hold Typically high computational cost Michael Gutmann Bayesian Inference by Density Ratio Estimation 15 / 40

19 Program Background Previous work Proposed approach Michael Gutmann Bayesian Inference by Density Ratio Estimation 16 / 40

20 Overview of some of my work 1. How should we assess whether x θ x o? Use classification (Gutmann et al, 2014, 2017) 2. How should we compute the proba of the event x θ x o? 3. For which values of θ should we compute it? Use Bayesian optimisation (Gutmann and Corander, ) Compared to standard approaches: speed-up by a factor of 1000 more 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Use density ratio estimation (Dutta et al, 2016, arxiv: ) Michael Gutmann Bayesian Inference by Density Ratio Estimation 17 / 40

21 Overview of some of my work 1. How should we assess whether x θ x o? Use classification (Gutmann et al, 2014, 2017) 2. How should we compute the proba of the event x θ x o? 3. For which values of θ should we compute it? Use Bayesian optimisation (Gutmann and Corander, ) Compared to standard approaches: speed-up by a factor of 1000 more 1. How should we assess whether x θ x o? 2. How should we compute the proba of the event x θ x o? Use density ratio estimation (Dutta et al, 2016, arxiv: ) Michael Gutmann Bayesian Inference by Density Ratio Estimation 18 / 40

22 Basic idea (Dutta et al, 2016, arxiv: ) Frame posterior estimation as ratio estimation problem p(θ x) = p(θ)p(x θ) p(x) r(x, θ) = p(x θ) p(x) = p(θ)r(x, θ) (2) (3) Estimating r(x, θ) is the difficult part since p(x θ) unknown. Estimate ˆr(x, θ) yields estimate of the likelihood function and posterior ˆL(θ) ˆr(x o, θ), ˆp(θ x o ) = p(θ)ˆr(x o, θ). (4) Michael Gutmann Bayesian Inference by Density Ratio Estimation 19 / 40

23 Estimating density ratios in general Relatively well studied problem (Textbook by Sugiyama et al, 2012) Bregman divergence provides general framework (Gutmann and Hirayama, 2011; Sugiyama et al, 2011) Here: density ratio estimation by logistic regression Michael Gutmann Bayesian Inference by Density Ratio Estimation 20 / 40

24 Density ratio estimation by logistic regression Samples from two data sets x (1) i p (1), i = 1,..., n (1) (5) x (2) i p (2), i = 1,..., n (2) (6) Probability that a test data point x was sampled from p (1) P(x p (1) x, h) = ν exp( h(x)), ν = n(2) n (1) (7) Michael Gutmann Bayesian Inference by Density Ratio Estimation 21 / 40

25 Density ratio estimation by logistic regression Estimate h by minimising J (h) = 1 n (1) n i=1 ( ) = h h (1) i x (1) i n = n (1) + n (2) [ ( log 1 + ν exp h (2) i ( = h h (1) i x (2) i ) )] n (2) + i=1 log [1 + 1ν ( exp h (2) i ) ] Objective is the re-scaled negated log-likelihood. For large n (1) and n (2) ĥ = argmin h J (h) = log p(1) p (2) without any constraints on h Michael Gutmann Bayesian Inference by Density Ratio Estimation 22 / 40

26 Estimating the posterior Property was used to estimate unnormalised models (Gutmann & Hyvärinen, 2010, 2012) It was used to estimate likelihood ratios (Pham et al, 2014; Cranmer et al, 2015) For posterior estimation, we use data generating pdf p(x θ) for p (1) marginal p(x) for p (2) (Other choices for p(x) possible too) sample sizes entirely under our control Michael Gutmann Bayesian Inference by Density Ratio Estimation 23 / 40

27 Estimating the posterior Logistic regression gives (point-wise in θ) ĥ(x, θ) log p(x θ) p(x) = log r(x, θ) (8) Estimated posterior and likelihood function: ˆp(θ x o ) = p(θ) exp(ĥ(x o, θ)) ˆL(θ) exp(ĥ(x o, θ)) (9) Michael Gutmann Bayesian Inference by Density Ratio Estimation 24 / 40

28 Estimating the posterior p(θ) θ x 0 θ 1 θ 2... θ nm Model : p(x θ) x m 1 x m 2... x m nm x θ 1 x θ 2... x θ nθ X m X θ Logistic regression: ĥ = arg min h J (h, θ) log-ratio ĥ(x, θ) ĥ(x 0, θ) ) ˆp(θ x 0 ) = p(θ) exp (ĥ(x0, θ) (Dutta et al, 2016, arxiv: ) Michael Gutmann Bayesian Inference by Density Ratio Estimation 25 / 40

29 Auxiliary model We need to specify a model for h. For simplicity: linear model b h(x) = β i ψ i (x) = β ψ(x) (10) i=1 where ψ i (x) are summary statistics More complex models possible Michael Gutmann Bayesian Inference by Density Ratio Estimation 26 / 40

30 Exponential family approximation Logistic regression yields ĥ(x; θ) = ˆβ(θ) ψ(x), ˆr(x, θ) = exp( ˆβ(θ) ψ(x)) (11) Resulting posterior ˆp(θ x o ) = p(θ) exp( ˆβ(θ) ψ(x o )) (12) Implicit exponential family approximation of p(x θ) ˆr(x, θ) = ˆp(x θ) ˆp(x) (13) ˆp(x θ) = ˆp(x) exp( ˆβ(θ) ψ(x)) (14) Implicit because ˆp(x) never explicitly constructed. Michael Gutmann Bayesian Inference by Density Ratio Estimation 27 / 40

31 Remarks Vector of summary statistics ψ(x) should include a constant for normalisation of the pdf (log partition function) Normalising constant is estimated via the logistic regression Simple linear model leads to a generalisation of synthetic likelihood L 1 penalty on β for weighing and selecting summary statistics Michael Gutmann Bayesian Inference by Density Ratio Estimation 28 / 40

32 Application to ARCH model Model: x (t) = θ 1 x (t 1) + e (t) (15) e (t) = ξ 0.2 (t) + θ 2 (e (t 1) ) 2 (16) ξ (t) and e (0) independent standard normal r.v., x (0) = time points Parameters: θ 1 ( 1, 1), θ 2 (0, 1) Uniform prior on θ 1, θ 2 Michael Gutmann Bayesian Inference by Density Ratio Estimation 29 / 40

33 Application to ARCH model Summary statistics: auto-correlations with lag one to five all (unique) pairwise combinations of them a constant To check robustness: 50% irrelevant summary statistics (drawn from standard normal) Comparison with synthetic likelihood with equivalent set of summary statistics (relevant sum. stats. only) Michael Gutmann Bayesian Inference by Density Ratio Estimation 30 / 40

34 Example posterior 1 1 θ2 0.5 θ2 0.5 true posterior estimated posterior θ 1 (a) synthetic likelihood true posterior estimated posterior θ 1 (b) proposed method Michael Gutmann Bayesian Inference by Density Ratio Estimation 31 / 40

35 Example posterior 1 1 θ2 0.5 θ2 0.5 true posterior estimated posterior θ 1 (c) synthetic likelihood true posterior estimated posterior θ 1 (d) proposed method subject to noise Michael Gutmann Bayesian Inference by Density Ratio Estimation 32 / 40

36 Systematic analysis Jensen-Shannon div between estimated and true posterior Point-wise comparison with synthetic likelihood (100 data sets) JSD = JSD for proposed method JSD for synthetic likelihood without noise with noise 0.2 density % 18% 83% 17% JSD Michael Gutmann Bayesian Inference by Density Ratio Estimation 33 / 40

37 Application to Ricker model Model x (t) N (t), φ Poisson(φN (t) ) (17) log N (t) = log r + log N (t 1) N (t 1) + σe (t) (18) t = 1,..., 50, N (0) = 0 (19) Parameters and priors log growth rate log r U(3, 5) scaling parameter φ U(5, 15) standard deviation σ U(0, 0.6) Michael Gutmann Bayesian Inference by Density Ratio Estimation 34 / 40

38 Application to Ricker model Summary statistics: same as Simon Wood (Nature, 2010) 100 inference problems For each problem, relative errors in posterior means were computed Point-wise comparison with synthetic likelihood Michael Gutmann Bayesian Inference by Density Ratio Estimation 35 / 40

39 Results for log r rel error = rel error proposed method rel error synth likelihood density % 33% rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 36 / 40

40 Results for σ rel error = rel error proposed method rel error synth likelihood 2 density % 41% rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 37 / 40

41 Results for φ rel error = rel error proposed method rel error synth likelihood 8 6 density % 45% rel-error Michael Gutmann Bayesian Inference by Density Ratio Estimation 38 / 40

42 Observations Compared two auxiliary models: exponential vs Gaussian family For same summary statistics, typically more accurate inferences for the richer exponential family model Robustness to irrelevant summary statistics thanks to L 1 regularisation Michael Gutmann Bayesian Inference by Density Ratio Estimation 39 / 40

43 Conclusions Background and previous work on inference with simulator-based / implicit statistical models Our work on: Framing the posterior estimation problem as a density ratio estimation problem Estimating the ratio with logistic regression Using regularisation to automatically select summary statistics Multitude of research possibilities: Choice of the auxiliary model Choice of the loss function used to estimate the density ratio Combine with Bayesian optimisation framework to reduce computational cost Michael Gutmann Bayesian Inference by Density Ratio Estimation 40 / 40

44 Conclusions Background and previous work on inference with simulator-based / implicit statistical models Our work on: Framing the posterior estimation problem as a density ratio estimation problem Estimating the ratio with logistic regression Using regularisation to automatically select summary statistics Multitude of research possibilities: Choice of the auxiliary model Choice of the loss function used to estimate the density ratio Combine with Bayesian optimisation framework to reduce computational cost More results and details in arxiv: v1 Michael Gutmann Bayesian Inference by Density Ratio Estimation 40 / 40

Efficient Likelihood-Free Inference

Efficient Likelihood-Free Inference Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

arxiv: v4 [stat.ml] 18 Dec 2018

arxiv: v4 [stat.ml] 18 Dec 2018 arxiv (December 218) Likelihood-free inference by ratio estimation Owen Thomas, Ritabrata Dutta, Jukka Corander, Samuel Kaski and Michael U. Gutmann, arxiv:1611.1242v4 [stat.ml] 18 Dec 218 Abstract. We

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Estimating Unnormalised Models by Score Matching

Estimating Unnormalised Models by Score Matching Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics

More information

Intractable Likelihood Functions

Intractable Likelihood Functions Intractable Likelihood Functions Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap p(x y o ) = z p(x,y o,z) x,z

More information

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Notes on Noise Contrastive Estimation (NCE)

Notes on Noise Contrastive Estimation (NCE) Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of

More information

Estimating Unnormalized models. Without Numerical Integration

Estimating Unnormalized models. Without Numerical Integration Estimating Unnormalized Models Without Numerical Integration Dept of Computer Science University of Helsinki, Finland with Michael Gutmann Problem: Estimation of unnormalized models We want to estimate

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Aapo Hyvärinen Gatsby Unit University College London Part III: Estimation of unnormalized models Often,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Adversarial Sequential Monte Carlo

Adversarial Sequential Monte Carlo Adversarial Sequential Monte Carlo Kira Kempinska Department of Security and Crime Science University College London London, WC1E 6BT kira.kowalska.13@ucl.ac.uk John Shawe-Taylor Department of Computer

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

arxiv: v3 [stat.co] 3 Mar 2017

arxiv: v3 [stat.co] 3 Mar 2017 Likelihood-free inference via classification Michael U. Gutmann Ritabrata Dutta Samuel Kaski Jukka Corander arxiv:7.9v [stat.co] Mar 7 Accepted for publication in Statistics and Computing (Feb, 7) Abstract

More information

Variational Bayesian Logistic Regression

Variational Bayesian Logistic Regression Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation

Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation Won Chang Post Doctoral Scholar, Department of Statistics, University of Chicago Oct 15, 2014 Thesis Advisors: Murali

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Estimation theory and information geometry based on denoising

Estimation theory and information geometry based on denoising Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Why (Bayesian) inference makes (Differential) privacy easy

Why (Bayesian) inference makes (Differential) privacy easy Why (Bayesian) inference makes (Differential) privacy easy Christos Dimitrakakis 1 Blaine Nelson 2,3 Aikaterini Mitrokotsa 1 Benjamin Rubinstein 4 Zuhe Zhang 4 1 Chalmers university of Technology 2 University

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

From independent component analysis to score matching

From independent component analysis to score matching From independent component analysis to score matching Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract First, short introduction

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Tutorial on Probabilistic Programming with PyMC3

Tutorial on Probabilistic Programming with PyMC3 185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course

More information

arxiv: v1 [stat.ml] 6 Dec 2018

arxiv: v1 [stat.ml] 6 Dec 2018 missiwae: Deep Generative Modelling and Imputation of Incomplete Data arxiv:1812.02633v1 [stat.ml] 6 Dec 2018 Pierre-Alexandre Mattei Department of Computer Science IT University of Copenhagen pima@itu.dk

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

A variational radial basis function approximation for diffusion processes

A variational radial basis function approximation for diffusion processes A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4

More information

ESTIMATION THEORY AND INFORMATION GEOMETRY BASED ON DENOISING. Aapo Hyvärinen

ESTIMATION THEORY AND INFORMATION GEOMETRY BASED ON DENOISING. Aapo Hyvärinen ESTIMATION THEORY AND INFORMATION GEOMETRY BASED ON DENOISING Aapo Hyvärinen Dept of Computer Science, Dept of Mathematics & Statistics, and HIIT University of Helsinki, Finland. ABSTRACT We consider a

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Robotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.

Robotics. Lecture 4: Probabilistic Robotics. See course website   for up to date information. Robotics Lecture 4: Probabilistic Robotics See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review: Sensors

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information