Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Size: px
Start display at page:

Download "Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis"

Transcription

1 Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28

2 Context : Computational Anatomy Context and motivations : * Describing shapes * Shape matching : many elaborated registrations theories * Defining and infering population average (atlas) Statistical estimation in presence of unobserved variables Proper statistical model = generative Consistency issues addressed * Discrimination/Classification

3 Context : Computational Anatomy Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

4 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

5 Warping and past approaches Linearized deformations : Non rigid deformations Let ψ be the deformation from I to I 1 and v a vector field : ψ = Id + v Resulting deformed image given by : I 1 (x) I (x v(x)) Advantages : Easy to use in computations Relevant for certain classes of shapes Drawbacks : Invertibility not guaranteed : Overlap Shape topology may change!

6 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n 1.

7 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n 1.

8 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n

9 Warping and past approaches Some solutions previously given : Only for the template! Procrustes mean : (Î, ˆv 1,..., ˆv n ) = arg min n I,v 1,...,v n i=1 ( 1 2 v i 2 V + 1 ) 2σ 2 y i φ v i I 2

10 Warping and past approaches Some solutions previously given : Only for the template! Procrustes mean : (Î, ˆv 1,..., ˆv n ) = arg min n I,v 1,...,v n i=1 A statistical interpretation : (Glasbey & Mardia) Problems : ( 1 2 v i 2 V + 1 ) 2σ 2 y i φ v i I 2 y i (x + v i (x)) = I (x) + ǫ x, ǫ x N(,σ 2 ), x Λ Needs interpolation Unobserved warping variables Not a generative statistical model

11 Warping and past approaches Issues : Describing databases as some i.i.d. sample of some parametric generative statistical model. Model parameters = template + deformation law (define the space V ) + noise Learning the parameters to avoid the previous problems : An intrinsic template I, A weighting term σ 2, A global geometric behavior in the class.

12 Statistical Generative Model Generative Statistical Model : Deformable template framework : y i j,k = I (r j,k v i (r j,k )) + σε j,k Conditions chosen for the template and the deformation fields : Parametric Model of Splines : let (p) kp 1 and (g)kg 1 be two sets of control points (fixed, uniformly distributed) : I (x) = K p α(x) = K p (x,p k )α(k), v β (x) = (K g β)(x) = k p k=1 k g k=1 K g (x,g k )β(k).

13 Statistical Generative Model What to learn? Photometry : Geometry : { α : to code the template σ 2 : the noise variance : to code each deformation β n 1 BUT : does not give the geometrical behavior in the training set. = Introduce a prior on β : β i ν(dβ) Parameters of ν = parameters to learn. Deformations (β i ) 1 i n = random hidden variables

14 Statistical Generative Model Generative Model : One component per class : (Γ g,θ p ) ν g ν p with θ p = (α,σ 2 ) β1 n n i=1 N(,Γ g) Γ g y n 1 n i=1 N(v β i I α,σ 2 Id Λ ) β n 1,θ p where ν g (dγ g ),ν p (dσ 2,dα) are prior laws on the parameters. Remark : big structures to learn even in the case of small size training set

15 Statistical Generative Model Generative Model (2) : General case : mixtures of deformable templates (τ m components per class) Hidden random variables : (β i ) 1 i n and the image labels (τ i ) 1 i n. ρ ν ρ θ = (θ τ g,θ τ p) 1 τ τm τm τ=1 (ν g ν p ) τ n 1 n i=1 τ m τ=1 ρ τ δ τ ρ β1 n n i=1n(,γ τ i g ) θ, τ1 n y1 n n i=1 N(v β i I ατi,στ 2 i Id Λ ) β1, n θ, τ1 n

16 Statistical Generative Model Generative Statistical Model : Unobserved variables y 1 τ 1,β 1,ε 1 Hyperparameters (ρ,α,γ,σ 2 ) 1... (ρ,α,γ,σ 2 ) m Population factors (fixed effects) τ ι,β ι,ε ι τ n,β n,ε n Individual factors (random effects) y i y n Fig.: Mixed effect structure for our BME-template

17 Estimation How to learn the parameters? the MAP Estimator : Parameters θ are estimated by maximum posterior likelihood : ˆθ = arg maxp(θ y) where θ Θ = { (α,σ 2,Γ g ) α R kp, σ 2 >, Γ g Sym + 2k g, (R) }. Sym + 2k g, (R) is the set of positive definite symmetric matrices. Let Θ = { θ Θ E P (log q(y θ )) = sup θ Θ E P (log q(y θ))} where P denotes the distribution governing the observations.

18 Estimation How to do in practice? Since β1 n are unobserved variables, a natural approach to reach the MAP estimator is the EM algorithm. Iteration l of the algorithm : E Step : Compute the posterior law on β i,i = 1,...,n. M Step : Parameter update : θ l+1 = arg max E [log q(θ,β1,y n 1 n ) y1 n,θ l ]. θ BUT : the E step is not tractable!

19 Estimation Details of the maximization step : Geometry : where θ g,l+1 = Γ g,l+1 = 1 n + a g (n[ββ t ] l + a g Σ g ). [ββ t ] l = 1 n n i=1 ββ t ν l,i (β)dβ, is the empirical covariance matrix with respect to the posterior density function. Importance of the prior!

20 Estimation Solution proposed : Stochastic version of the EM algorithm : Idea : Couple SAEM with MCMC procedure (Delyon, Lavielle, Moulines and Kuhn, Lavielle) : One component case : Iteration l l + 1 of the algorithm : Simulation step : β l+1 Π θl (β l, ) where Π θl (β l, ) is a transition probability of a convergent Markov Chain having the posterior distribution as stationary distribution, Stochastic approximation : Q l+1 (θ) = Q l (θ) + l [log q(y,β l+1,θ) Q l (θ)] where ( l ) is a decreasing sequence of positive step-sizes. Maximization step : θ l+1 = arg maxq l+1 (θ) [ ] Π θl (β l, ) given by an hybrid Gibbs sampler

21 Estimation Stochastic version of the EM algorithm (2) : Since our model is an Exponential Model, q(y,β l+1,θ) = exp { ψ(θ) + S(y,β),φ(θ) } the stochastic approximation can be done on the sufficient statistics S so that the algorithm is done via : ( ) s l+1 = s l + l S(y,β l+1 ) s l Let L(s,θ) = ψ(θ) + s,φ(θ), l(θ) = log q(y,θ) and ˆθ(s) = arg max L(s,θ(s)) then s θ k+1 = ˆθ(s l+1 ).

22 Estimation Stochastic approximation with truncation on random boundaries : Set κ =, s K and β K. k 1 compute s = s k 1 + k 1 (S( β) s k 1 ) where β is sampled from a transition kernel Π θk 1 (β k 1,.). If s K κk 1 and s s k 1 ε k 1 set (s k,β k ) = ( s, β) and κ k = κ k 1, else set (s k,β k ) = ( s, β) K xk and κ k = κ k 1 + 1, where ( s, β) can be chosen through different ways. θ k = arg max L(s k,θ) θ

23 Multicomponent Model Stochastic version of the EM algorithm for the multicomponent model : Intuitive generalization : problem of trapping states. Image analysis interpretation : each iteration tries to deform the data so it is closer to its current component and will not tend to move toward another one high dimensional hidden variable β Solution proposed : Consider another simulation method based on the Gibbs sampler for the deformation and on another law for the class of a given image.

24 Multicomponent Model The new algorithm : Transition step l l + 1 using a hybrid Gibbs sampler on (β,τ) : for each τ : Run N l times the hybrid Gibbs Sampler on β given τ. = Π N l (β τ) ˆβ (l+1) τ draw τ (l+1) through the discrete law with weights : ( 1 p Nl (τ) N N l i mc=1 [ f (ˆβ τ,(imc)) q(y, ˆβ τ,(imc),τ θ,ρ) ]) 1 β (l+1) = ˆβ τ (l+1),(n l )

25 Multicomponent Model Theoretical Results : With these models and algorithms we have proved some important asymptotic results : Consistency of the MAP estimator Convergence of both stochastic algorithms

26 Experiments MCMC-SAEM : Template estimation : Fig.: Left : one component prototype. Right : 2 component prototypes

27 Experiments MCMC-SAEM : The Geometric Distribution : Fig.: Left : 2 examples of the training set. Right : 2 examples drawn from the prior geometric distribution.

28 Experiments MCMC-SAEM : Geometry :

29 Experiments In presence of noise :

30 Experiments Medical Images : Splenium of the Corpus Callossum 47 images of the corpus callosum (and part of the cerebellum) Fig.: Left : Gray level mean of the 47 images. Right : template estimated with the stochastic algorithm.

31 Experiments Medical Images : Splenium of the Corpus Callossum 47 images of the corpus callosum (and part of the cerebellum) clustered into 2 components by the multicomponent model. Fig.: Results from the estimation with the stochastic algorithm. Left : component 1. Right : component 2.

32 Experiments Robustness of the algorithm : Same hyper-parameters as the previous gray level images

33 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

34 BME-DTI Template Bayesian Mixed Effect DTI Template : Goals of our approach : * Estimate a Template of Diffusion Tensor Image on a given region of the anatomy * Just use the Diffusion Weight Images [DWIs] (real vector corresponding to the response to some different gradients)

35 BME-DTI Template Previous approach Subjects Least square approx. mean Min of energy DWI G1 Subject 1... DTI 1 DWI Gm DWI G1 Subject 2... DTI 2 DWI Gm... DWI G1 Subject n... DTI n DWI Gm DTI template

36 BME-DTI Template Our approach Subjects observed (hidden) Max likelihood DWI G1 Subject 1... (DTI 1) DWI Gm DWI G1 Subject 2... (DTI 2) DWI Gm... DWI G1 Subject n... (DTI n) DWI Gm DTI template

37 C. A. BME Template MCMC-SAEM algorithm Experiments DTI template Noisy ICA Conclusion BME-DTI Template Results on synthetic data : 2 different template tensors FA 1 =.6791, FA 2 =.6918 ADC 1 =.538, ADC 2 = random samples with 15 subjects each LS FAM-EM. SAEM LS FAM-EM SAEM bias var mse FA ADC

38 BME-DTI Template ellipsoide reelle oriente ellipsoide moindres carres oriente ellipsoide mode oriente ellipsoide stochatique oriente

39 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

40 Noisy ICA Model Observations : X n 1 such as X i = AY i + σε, A : source matrix σ 2 : variance of the Gaussian noise Y n 1 hidden variables. Model : Y n,p 1,1 n i=1 p j=1 ν η η, X n 1 N i=1 N(AY i,σ 2 Id) A, σ 2, Y N 1. Various choice of the distribution ν η Same MCMC-SAEM algorithm to treat this estimation problem.

41 C. A. BME Template MCMC-SAEM algorithm DTI template Experiments Noisy ICA Experiments : 11 subjects, 2 I.C Conclusion

42 Conclusion Generative statistical model = proper statistical framework for designing and inferring population average Stochastic algorithm of multiple uses even in difficult conditions Thank you!

43

44 MCMC-SAEM : The Geometric Distribution : Between 2 classes : Between 2 components in the same class :

Stéphanie Allassonniere, Estelle Kuhn. To cite this version: HAL Id: hal https://hal.archives-ouvertes.fr/hal v4

Stéphanie Allassonniere, Estelle Kuhn. To cite this version: HAL Id: hal https://hal.archives-ouvertes.fr/hal v4 Convergent Stochastic Expectation Maximization algorithm with efficient sampling in high dimension. Application to deformable template model estimation Stéphanie Allassonniere, Estelle Kuhn To cite this

More information

Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data

Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data Stéphanie Allassonnière with J.B. Schiratti, O. Colliot and S. Durrleman Université Paris Descartes & Ecole Polytechnique

More information

TOWARD A COHERENT STATISTICAL FRAMEWORK FOR DENSE DEFORMABLE TEMPLATE ESTIMATION

TOWARD A COHERENT STATISTICAL FRAMEWORK FOR DENSE DEFORMABLE TEMPLATE ESTIMATION TOWARD A COHERENT STATISTICAL FRAMEWORK FOR DENSE DEFORMABLE TEMPLATE ESTIMATION S. ALLASSONNIÈRE, Y. AMIT, A. TROUVÉ Abstract. The problem of estimating probabilistic deformable template models in the

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

arxiv: v1 [stat.me] 2 Apr 2016

arxiv: v1 [stat.me] 2 Apr 2016 Online EM for Functional Data Florian Maire a,, Eric Moulines b, Sidonie Lefebvre c a School of Mathematics and Statistics, University College Dublin, Ireland b CMAP, École Polytechnique, 91128 Palaiseau,

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Monitoring Wafer Geometric Quality using Additive Gaussian Process Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Patterns of Scalable Bayesian Inference Background (Session 1)

Patterns of Scalable Bayesian Inference Background (Session 1) Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Overview of Spatial Statistics with Applications to fmri

Overview of Spatial Statistics with Applications to fmri with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016 Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization

Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization Colin Fox fox@physics.otago.ac.nz Al Parker, John Bardsley Fore-words Motivation: an inverse oceanography

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Bayesian Registration of Functions with a Gaussian Process Prior

Bayesian Registration of Functions with a Gaussian Process Prior Bayesian Registration of Functions with a Gaussian Process Prior RELATED PROBLEMS Functions Curves Images Surfaces FUNCTIONAL DATA Electrocardiogram Signals (Heart Disease) Gait Pressure Measurements (Parkinson

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information