Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
|
|
- Rosamund Fox
- 5 years ago
- Views:
Transcription
1 Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28
2 Context : Computational Anatomy Context and motivations : * Describing shapes * Shape matching : many elaborated registrations theories * Defining and infering population average (atlas) Statistical estimation in presence of unobserved variables Proper statistical model = generative Consistency issues addressed * Discrimination/Classification
3 Context : Computational Anatomy Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA
4 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA
5 Warping and past approaches Linearized deformations : Non rigid deformations Let ψ be the deformation from I to I 1 and v a vector field : ψ = Id + v Resulting deformed image given by : I 1 (x) I (x v(x)) Advantages : Easy to use in computations Relevant for certain classes of shapes Drawbacks : Invertibility not guaranteed : Overlap Shape topology may change!
6 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n 1.
7 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n 1.
8 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n
9 Warping and past approaches Some solutions previously given : Only for the template! Procrustes mean : (Î, ˆv 1,..., ˆv n ) = arg min n I,v 1,...,v n i=1 ( 1 2 v i 2 V + 1 ) 2σ 2 y i φ v i I 2
10 Warping and past approaches Some solutions previously given : Only for the template! Procrustes mean : (Î, ˆv 1,..., ˆv n ) = arg min n I,v 1,...,v n i=1 A statistical interpretation : (Glasbey & Mardia) Problems : ( 1 2 v i 2 V + 1 ) 2σ 2 y i φ v i I 2 y i (x + v i (x)) = I (x) + ǫ x, ǫ x N(,σ 2 ), x Λ Needs interpolation Unobserved warping variables Not a generative statistical model
11 Warping and past approaches Issues : Describing databases as some i.i.d. sample of some parametric generative statistical model. Model parameters = template + deformation law (define the space V ) + noise Learning the parameters to avoid the previous problems : An intrinsic template I, A weighting term σ 2, A global geometric behavior in the class.
12 Statistical Generative Model Generative Statistical Model : Deformable template framework : y i j,k = I (r j,k v i (r j,k )) + σε j,k Conditions chosen for the template and the deformation fields : Parametric Model of Splines : let (p) kp 1 and (g)kg 1 be two sets of control points (fixed, uniformly distributed) : I (x) = K p α(x) = K p (x,p k )α(k), v β (x) = (K g β)(x) = k p k=1 k g k=1 K g (x,g k )β(k).
13 Statistical Generative Model What to learn? Photometry : Geometry : { α : to code the template σ 2 : the noise variance : to code each deformation β n 1 BUT : does not give the geometrical behavior in the training set. = Introduce a prior on β : β i ν(dβ) Parameters of ν = parameters to learn. Deformations (β i ) 1 i n = random hidden variables
14 Statistical Generative Model Generative Model : One component per class : (Γ g,θ p ) ν g ν p with θ p = (α,σ 2 ) β1 n n i=1 N(,Γ g) Γ g y n 1 n i=1 N(v β i I α,σ 2 Id Λ ) β n 1,θ p where ν g (dγ g ),ν p (dσ 2,dα) are prior laws on the parameters. Remark : big structures to learn even in the case of small size training set
15 Statistical Generative Model Generative Model (2) : General case : mixtures of deformable templates (τ m components per class) Hidden random variables : (β i ) 1 i n and the image labels (τ i ) 1 i n. ρ ν ρ θ = (θ τ g,θ τ p) 1 τ τm τm τ=1 (ν g ν p ) τ n 1 n i=1 τ m τ=1 ρ τ δ τ ρ β1 n n i=1n(,γ τ i g ) θ, τ1 n y1 n n i=1 N(v β i I ατi,στ 2 i Id Λ ) β1, n θ, τ1 n
16 Statistical Generative Model Generative Statistical Model : Unobserved variables y 1 τ 1,β 1,ε 1 Hyperparameters (ρ,α,γ,σ 2 ) 1... (ρ,α,γ,σ 2 ) m Population factors (fixed effects) τ ι,β ι,ε ι τ n,β n,ε n Individual factors (random effects) y i y n Fig.: Mixed effect structure for our BME-template
17 Estimation How to learn the parameters? the MAP Estimator : Parameters θ are estimated by maximum posterior likelihood : ˆθ = arg maxp(θ y) where θ Θ = { (α,σ 2,Γ g ) α R kp, σ 2 >, Γ g Sym + 2k g, (R) }. Sym + 2k g, (R) is the set of positive definite symmetric matrices. Let Θ = { θ Θ E P (log q(y θ )) = sup θ Θ E P (log q(y θ))} where P denotes the distribution governing the observations.
18 Estimation How to do in practice? Since β1 n are unobserved variables, a natural approach to reach the MAP estimator is the EM algorithm. Iteration l of the algorithm : E Step : Compute the posterior law on β i,i = 1,...,n. M Step : Parameter update : θ l+1 = arg max E [log q(θ,β1,y n 1 n ) y1 n,θ l ]. θ BUT : the E step is not tractable!
19 Estimation Details of the maximization step : Geometry : where θ g,l+1 = Γ g,l+1 = 1 n + a g (n[ββ t ] l + a g Σ g ). [ββ t ] l = 1 n n i=1 ββ t ν l,i (β)dβ, is the empirical covariance matrix with respect to the posterior density function. Importance of the prior!
20 Estimation Solution proposed : Stochastic version of the EM algorithm : Idea : Couple SAEM with MCMC procedure (Delyon, Lavielle, Moulines and Kuhn, Lavielle) : One component case : Iteration l l + 1 of the algorithm : Simulation step : β l+1 Π θl (β l, ) where Π θl (β l, ) is a transition probability of a convergent Markov Chain having the posterior distribution as stationary distribution, Stochastic approximation : Q l+1 (θ) = Q l (θ) + l [log q(y,β l+1,θ) Q l (θ)] where ( l ) is a decreasing sequence of positive step-sizes. Maximization step : θ l+1 = arg maxq l+1 (θ) [ ] Π θl (β l, ) given by an hybrid Gibbs sampler
21 Estimation Stochastic version of the EM algorithm (2) : Since our model is an Exponential Model, q(y,β l+1,θ) = exp { ψ(θ) + S(y,β),φ(θ) } the stochastic approximation can be done on the sufficient statistics S so that the algorithm is done via : ( ) s l+1 = s l + l S(y,β l+1 ) s l Let L(s,θ) = ψ(θ) + s,φ(θ), l(θ) = log q(y,θ) and ˆθ(s) = arg max L(s,θ(s)) then s θ k+1 = ˆθ(s l+1 ).
22 Estimation Stochastic approximation with truncation on random boundaries : Set κ =, s K and β K. k 1 compute s = s k 1 + k 1 (S( β) s k 1 ) where β is sampled from a transition kernel Π θk 1 (β k 1,.). If s K κk 1 and s s k 1 ε k 1 set (s k,β k ) = ( s, β) and κ k = κ k 1, else set (s k,β k ) = ( s, β) K xk and κ k = κ k 1 + 1, where ( s, β) can be chosen through different ways. θ k = arg max L(s k,θ) θ
23 Multicomponent Model Stochastic version of the EM algorithm for the multicomponent model : Intuitive generalization : problem of trapping states. Image analysis interpretation : each iteration tries to deform the data so it is closer to its current component and will not tend to move toward another one high dimensional hidden variable β Solution proposed : Consider another simulation method based on the Gibbs sampler for the deformation and on another law for the class of a given image.
24 Multicomponent Model The new algorithm : Transition step l l + 1 using a hybrid Gibbs sampler on (β,τ) : for each τ : Run N l times the hybrid Gibbs Sampler on β given τ. = Π N l (β τ) ˆβ (l+1) τ draw τ (l+1) through the discrete law with weights : ( 1 p Nl (τ) N N l i mc=1 [ f (ˆβ τ,(imc)) q(y, ˆβ τ,(imc),τ θ,ρ) ]) 1 β (l+1) = ˆβ τ (l+1),(n l )
25 Multicomponent Model Theoretical Results : With these models and algorithms we have proved some important asymptotic results : Consistency of the MAP estimator Convergence of both stochastic algorithms
26 Experiments MCMC-SAEM : Template estimation : Fig.: Left : one component prototype. Right : 2 component prototypes
27 Experiments MCMC-SAEM : The Geometric Distribution : Fig.: Left : 2 examples of the training set. Right : 2 examples drawn from the prior geometric distribution.
28 Experiments MCMC-SAEM : Geometry :
29 Experiments In presence of noise :
30 Experiments Medical Images : Splenium of the Corpus Callossum 47 images of the corpus callosum (and part of the cerebellum) Fig.: Left : Gray level mean of the 47 images. Right : template estimated with the stochastic algorithm.
31 Experiments Medical Images : Splenium of the Corpus Callossum 47 images of the corpus callosum (and part of the cerebellum) clustered into 2 components by the multicomponent model. Fig.: Results from the estimation with the stochastic algorithm. Left : component 1. Right : component 2.
32 Experiments Robustness of the algorithm : Same hyper-parameters as the previous gray level images
33 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA
34 BME-DTI Template Bayesian Mixed Effect DTI Template : Goals of our approach : * Estimate a Template of Diffusion Tensor Image on a given region of the anatomy * Just use the Diffusion Weight Images [DWIs] (real vector corresponding to the response to some different gradients)
35 BME-DTI Template Previous approach Subjects Least square approx. mean Min of energy DWI G1 Subject 1... DTI 1 DWI Gm DWI G1 Subject 2... DTI 2 DWI Gm... DWI G1 Subject n... DTI n DWI Gm DTI template
36 BME-DTI Template Our approach Subjects observed (hidden) Max likelihood DWI G1 Subject 1... (DTI 1) DWI Gm DWI G1 Subject 2... (DTI 2) DWI Gm... DWI G1 Subject n... (DTI n) DWI Gm DTI template
37 C. A. BME Template MCMC-SAEM algorithm Experiments DTI template Noisy ICA Conclusion BME-DTI Template Results on synthetic data : 2 different template tensors FA 1 =.6791, FA 2 =.6918 ADC 1 =.538, ADC 2 = random samples with 15 subjects each LS FAM-EM. SAEM LS FAM-EM SAEM bias var mse FA ADC
38 BME-DTI Template ellipsoide reelle oriente ellipsoide moindres carres oriente ellipsoide mode oriente ellipsoide stochatique oriente
39 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA
40 Noisy ICA Model Observations : X n 1 such as X i = AY i + σε, A : source matrix σ 2 : variance of the Gaussian noise Y n 1 hidden variables. Model : Y n,p 1,1 n i=1 p j=1 ν η η, X n 1 N i=1 N(AY i,σ 2 Id) A, σ 2, Y N 1. Various choice of the distribution ν η Same MCMC-SAEM algorithm to treat this estimation problem.
41 C. A. BME Template MCMC-SAEM algorithm DTI template Experiments Noisy ICA Experiments : 11 subjects, 2 I.C Conclusion
42 Conclusion Generative statistical model = proper statistical framework for designing and inferring population average Stochastic algorithm of multiple uses even in difficult conditions Thank you!
43
44 MCMC-SAEM : The Geometric Distribution : Between 2 classes : Between 2 components in the same class :
Stéphanie Allassonniere, Estelle Kuhn. To cite this version: HAL Id: hal https://hal.archives-ouvertes.fr/hal v4
Convergent Stochastic Expectation Maximization algorithm with efficient sampling in high dimension. Application to deformable template model estimation Stéphanie Allassonniere, Estelle Kuhn To cite this
More informationMixed effect model for the spatiotemporal analysis of longitudinal manifold value data
Mixed effect model for the spatiotemporal analysis of longitudinal manifold value data Stéphanie Allassonnière with J.B. Schiratti, O. Colliot and S. Durrleman Université Paris Descartes & Ecole Polytechnique
More informationTOWARD A COHERENT STATISTICAL FRAMEWORK FOR DENSE DEFORMABLE TEMPLATE ESTIMATION
TOWARD A COHERENT STATISTICAL FRAMEWORK FOR DENSE DEFORMABLE TEMPLATE ESTIMATION S. ALLASSONNIÈRE, Y. AMIT, A. TROUVÉ Abstract. The problem of estimating probabilistic deformable template models in the
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationdans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés
Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationBayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models
Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationAges of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008
Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationarxiv: v1 [stat.me] 2 Apr 2016
Online EM for Functional Data Florian Maire a,, Eric Moulines b, Sidonie Lefebvre c a School of Mathematics and Statistics, University College Dublin, Ireland b CMAP, École Polytechnique, 91128 Palaiseau,
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationMonitoring Wafer Geometric Quality using Additive Gaussian Process
Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department
More informationEM & Variational Bayes
EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationPatterns of Scalable Bayesian Inference Background (Session 1)
Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationOverview of Spatial Statistics with Applications to fmri
with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016 Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationPolynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization
Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization Colin Fox fox@physics.otago.ac.nz Al Parker, John Bardsley Fore-words Motivation: an inverse oceanography
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationBayesian Registration of Functions with a Gaussian Process Prior
Bayesian Registration of Functions with a Gaussian Process Prior RELATED PROBLEMS Functions Curves Images Surfaces FUNCTIONAL DATA Electrocardiogram Signals (Heart Disease) Gait Pressure Measurements (Parkinson
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More information