AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET. Machine Learning T. Schön. (Chapter 11) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET
|
|
- Erica Rosalyn Wilcox
- 6 years ago
- Views:
Transcription
1 About the Eam I/II) ), Lecture 7 MCMC and Sampling Methods Thomas Schön Division of Automatic Control Linköping University Linköping, Sweden. schon@isy.liu.se, Phone: 3-373, If you have followed the course and completed the eercises you will not be surprised when you see the eam. You will learn new things during the eam. Practicalities: Time frame: days h), somewhere during week 3. Within hours after you have collected the eam, you put your solutions in an envelope seal it) and hand it in. About the Eam II/II) 3) As usual the graduate eam honor code applies. This means, The course books, other books and MATLAB are all allowed aids. Internet services such as , web browsers and other communication with the surrounding world concerning the eam is OT allowed. You are OT allowed to actively search for the solutions in books, papers, the Internet or anywhere else. You are OT allowed to talk to others save for the responsible teachers) about the eam at all. If anything is unclear concerning what is allowed and not, just ask any of the responsible teachers. You are not allowed to look at eams from earlier version of the course obviously hard in this course anyway...). Outline of Lecture 7 ) Summary of lecture Motivation for Monte Carlo methods Basic Sampling Methods Transformation Methods Rejection Sampling Importance Sampling Markov Chain Monte Carlo MCMC) General Properties Metropolis-Hastings Algorithm Gibbs Sampling Chapter )
2 Summary of Lecture I/III) ) In boosting we train a sequence of M models y m ), where the error function used to train a certain model depends on the performance of the previous models. The models are then combined to produce the resulting classifier for the two class problem) according to M ) Y M ) = sign α m y m ) m= We saw that the AdaBoost algorithm can be interpreted as a sequential minimization of an eponential cost function. Graphical Models: A graphical description of a probabilistic model where variables are represented by nodes and the relationships between variables correspond to edges. Summary of Lecture II/III) ) We started introducing some basic concepts for probabilistic graphical models G = V, L) consisting of. a set of nodes V a.k.a. vertices) representing the random variables and. a set of links L a.k.a. edges or arcs) containing elements i, j) L connecting a pair of nodes i, j) V and thereby encoding the probabilistic relations between nodes.... y y y Summary of Lecture III/III) 7) The set of parents to node j is defined as Pj) {i V i, j) E} The directed graph describes how the joint distribution factors into a product of factors p i Pi) ) only depending on a subset of the variables, p V ) = p i Pi) ). i V Hence, for the state-space model on the previous slide, we have px, Y) = p ) t= p t t ) t= py t t ) Motivation for Monte Carlo I/III) ) Probabilistic inference obviously depends on probability density functions. We have two important problems with probabilistic inference:. Computing integrals Eamples: f )d. Optimization Eamples: ˆ = arg ma Bayesian Inference p y) = py ) py ) d Maimum likelihood ˆ ML = arg ma py ) Marginalization p ) = p, ) d Maimum a posteriori ˆ MAP = arg ma p y)
3 Motivation for Monte Carlo II/III) 9) Motivation for Monte Carlo III/III) ) The models of reality are becoming more and more complicated for us to be able to perform these operations eactly. The standard way of solving these problems is to make analytical approimations either in the model or in the solution. Eamples: Use conjugate priors to make analytical calculation of the integrals and optimization possible. Variational approimations in the solutions. Analytical approimations change either the problem or the solution that we are trying to obtain and the effects are not always predictable Another framework is to use numerical methods called Monte Carlo methods to solve these problems. With Monte Carlo, no sacrifice in the model or in the solution is made. The accuracy is limited only by our. computational resources. 3 3 Both integration and optimization can be done with these methods. If one can represent a complicated density with samples as i= δ i)) any integral can be calculated by f ) d f i) ) i= Basic Sampling Algorithms ) Transformation Method I/II) ) Most processing environments have built-in random number generators for the uniform distribution. Assume that we would like to generate samples from a general univariate density.the following scheme provides the way. Generating samples from density Generate u U, ) Calculate i) = cdf u) where cdf) = p ) d is the cumulative distribution function of. and cdf).... Suppose we have a random variable z p z z) which we can obtain the samples from. If we transform the samples of z with an invertible function f ) as y = f z), the density of the samples we obtain would be p y y) = p z z) dz dy = p zf y)) df y) dy The second term on the r.h.s. is absolutely necessary. Without it the r.h.s. might not even be a proper density. Eample: Let z U, ). Let f z) = z. Then we have f y) = y for y. py) = p z y) y = y for < y and otherwise. py) and pz).... y,z pz) py)
4 Transformation Method II/II) 3) Random umber Generation - Perfect Sampling ) For Gaussian random variable generation, the following transformation method is proposed. Bo-Muller Method. Generate z, z U, ).. If r z + z >, discard the samples and go to. 3. If r, calculate ln r ln r y = z r and y = z r for which y, y, ). z z y z.. z y Assumption: We have access to samples from the density we seek to estimate, ˆp ) = i= δ i) We are seeking an estimate according to, Î g)) = g)ˆp )d = i= gi ). The strong law of large numbers lim Î g)) a.s. Ig)). Central limit theorem lim Î g)) Ig)) ) σ d, ) Perfect Sampling - Eample and Problem ) Consider sampling from the following Gaussian miture =.3, ) +.7 9, 9) Random umber Generation ) Target density p) - We seek samples distributed according to this density. Proposal density q) - This density is simple to generate samples from. Acceptance probability w) - Used to decide whether the sample is OK. p ) w )q ) samples samples Obvious Problem: In general we are OT able to sample from the density we are interested in! Three common algorithms based on this idea:. Rejection sampling. Importance sampling 3. Metropolis-Hastings
5 Rejection Sampling I/IV) 7) Rejection Sampling II/IV) ) Suppose is too complicated to sample directly from. Let us introduce a latent variable u. Consider the joint distribution p, u) pu ) where we define pu ) Uu;, ) Then p, u) =... { =, if u, otherwise Hence, we must sample uniformly over the area under.. For this purpose, we use a proposal. density q) that is easy to sample from such that Kq) for all. Rejection Sampling. Sample i) q ).. Sample u U, Kq i) )). 3. If u p i) ) accept the sample i) as a valid sample from p ). Go to.. Otherwise, discard i) and go to. and q).. Rejection Sampling III/IV) 9) This procedure does not depend on the fact that p ) is normalized. i.e., all p ) terms can be replaced by an unnormalized version p ) such that = p ) d The procedure can be used with multivariate densities in the same way. The rejection rate is given by K. This is the percentage of what we waste. Therefore, one must select K as small as possible while still satisfying Kq) for all. There are adaptive versions where one tries to obtain better proposals during the sampling process. Even the optimal K generally grows eponentially as the dimension increases. Rejection Sampling IV/IV) ) Eample: Kernel based density estimates from samples obtained with rejection sampling. and q) = = = =
6 Importance Sampling I/IV) ) Importance Sampling II/IV) ) So far, we have presented sampling approaches where each sample has equal contribution importance) in the approimating particle sum as which gave f ) d δ i)) i= i= f i) ) In importance sampling, one samples from a proposal density i) q ) and uses a weighted approimation for p ). q) δ i)) i= We have the integral f ) d = f ) q) q) d i= w i) f i) ) where w i) pi) ) q i) ). This approimation procedure is equivalent to approimating p ) as i= w i) δ i)) When the density p ) is not normalized, one uses the approimation w i) δ i)) where w i) = wi) i= i= wi) Importance Sampling III/V) 3) Importance Sampling III/IV) ) Algorithm Importance sampling). Generate i.i.d. samples { i } i= from the proposal density q) and compute the importance weights w i = p i )/q i ), i =,...,.. Form the acceptance probabilities by normalization, w i = w i / j= w j, i =,...,. Results in the following approimation of the target density, = w i δ i) i= In rejection and other types of sampling, only particles positions denseness) carry information. In importance sampling, weights also carry important information. ote that, a large weight does not necessarily mean that the density value there is high. Particles density is still important..... Rejection Sampling q) w.... Importance Sampling q) w
7 Importance Sampling IV/IV) ) Sampling Importance Resampling ).... Proposal selection is very important. arrower proposals than the density can cause poor representation of the density in some parts of space. It is, in general, a good idea to choose wide proposals keeping in mind that a too wide proposal would result in too many samples with tiny weights which is a waste of computation. = q) w ˆ.... = q) w ˆ.... = q) w ˆ p ) i= w i δ i ) We can now make use of resampling in order to generate an unweighted set of samples. This is done by drawing new samples with replacement according to, P i = j) = w j, j =,...,, resulting in the following unweighted approimation ˆp ) = i= δ i ) Sampling Importance Resampling - Algorithm 7) The Importance of a Good Importance Density ) Algorithm Sampling Importance Resampling SIR)). Generate i.i.d. samples { i } i= from the proposal density q) and compute the importance weights w i = p i )/q i ), i =,...,.. Form the acceptance probabilities by normalization, w i = w i / j= w j, i =,...,. 3. For each i =,..., draw a new particle i t with replacement resample) according to, P i = j) = w j, j =,...,. q ) =, ) q ) =, ) samples used in booth eperiments. Lesson learned: It is very important to be careful in selecting the importance density.
8 State Estimation 9) For a nonlinear state-space model t+ p t+ t ) y t py t t ) we can show that the filtering and the one-step ahead prediction densities are {}}{{ }} { py t t ) p t y :t ) p t y :t ) = py t y :t ) p t+ y :t ) = p t+ t ) }{{} p t y :t ) }{{} d t The Particle Filter 3) Idea: Make use of SIR in order to find a first particle filter, i.e., an algorithm that provides approimations of p t y :t ) Recall, that we have from the previous slide) p t y :t ) = py t t )p t y :t ) py t y :t ) p t y :t ) py t t ) p t y :t ) }{{}}{{}}{{} p t ) a t ) q t ) This implies that SIR can be used to produce estimates of p t y :t ). A First Particle Filter 3) Algorithm A first particle filter). Initialize the particles, { i } i= p ) and let t :=.. Predict the particles by drawing i.i.d. samples, i t p t i t ), i =,...,. 3. Compute the importance weights { w i t } i=, w i t = py t i t), i =,...,. and normalize w i t = wi t / j= wj t.. For each i =,..., draw a new particle i t with replacement, P i t = j t ) = wj t, j =,...,.. Set t := t + and repeat from step. IS Eample Linear System Identification I/III) 3) Consider the following linear scalar state-space model k+ = θ k + v k, y k = k + e k, vk e k ) The initial state: ;, Σ ). θ with prior distribution θ θ;, σ θ ) ) )) σ, v σe. The identification problem is now to determine the posterior pθ y : ) using Importance Sampling. As usual, note the difference in notation compared to Bishop! The observations are denoted y and the latent variables are given by.
9 IS Eample Linear System Identification II/III) 33) k+ = θ k + v k, y k = k + e k, vk e k ) ) )) σ, v σe. We have solved this problem with both EM and VB using the latent variables : {,..., }. The main equation for the importance sampling for this eample is pθ y : ) py : θ)pθ) The problem here is that we cannot normalize this density analytically since py : θ) is too complicated. We can still evaluate py : θ) for different values of θ. IS Eample Linear System Identification III/III) 3) We would like to sample from pθ y : ) py : θ)pθ). Choose the proposal as the prior q ) = p ). Sample θ i) q ). Set the weights as w i) = py : θ i) )pθ i) ) qθ i) ) ormalize the weights w i) = = py : θ i) ) pθ y : )..... = iter no: MC : True posterior True θ EM VB IS wi).. i= wi) θ The likelihood py : θ i) ) required for the weights above is given by a Kalman filter as py : θ i) ) =py ) i= py i y :i, θ i) ) i= y i ; ŷ i i θ i) ), S i i θ i) )) Markov Chain Monte Carlo 3) Markov Chain Monte Carlo 3) Importance sampling is also bound to fail in high dimensions. This is due to the fact that, in high dimensions, the support of the density to be sampled from is only a tiny region in the overall space and for this case, it is very difficult to find a proposal without knowing the actual density. Markov Chain Monte Carlo MCMC) is proposed to overcome this problem. Whereas in standard sampling methods, the samples are independent from each other, MCMC uses dependent samples. Due to the Markov property, each sample is dependent on the previous sample i.e., each i) depends on i ). In general i) is generated as i) q i ) ) to sample from. Obviously, we want the overall behavior of the generated samples to be similar to those of p ). MCMC methods provide a way to do this with an arbitrary proposal q ). Metropolis Hastings Algorithm Generate an initial sample ) q ). For i=,..., Sample q i ) ). Sample u U, ). Set the new sample i) as ) p ) q i), if u < min, i ) ) = p i ) ) q i ) ). i ), otherwise
10 M-H Eample: Sampling from a Gaussian I/II) 37) M-H Eample: Sampling from a Gaussian II/II) samples σma. ρ = ;,. σ Choose the proposal density q z) as. i ) i ) q ) = ;,. samples p ) p ) samples p ) Suppose the target density over R is 3) min samples p ) MH Some General Issues II/II) ) samples Proposal selection is still an important problem. If the proposal is selected too narrow, then step-sizes get smaller and the burn-in period becomes longer. ρ If the proposal is too wide, then the burn-in gets shorter, however, the acceptance rate is decreased significantly. σma p ) p ) samples p ) 39) Diagnosing convergence to the target distribution with MCMC algorithms is still an active area of research. We generally have to use only the samples obtained after the burn-in period. After the burn-in period is over, the Markov chain is said to be mied. samples M-H Some General Issues I/II) The time that passes before the samples starting to represent the target density is called burn-in period. This version of the Metropolis-Hastings algorithm is called the Metropolis algorithm. M-H makes the samples converge to the samples of a stationary distribution which is the target distribution p ). p ) = min, p i ) )! oticing that q i ) ) = qi ) ) for all, we have p ) qi ) ) min, pi ) ) q i ) ) σma ρ σmin σmin
11 cation to discrete or continuous outcomes.) The use of dependent sequences contrasts with many classical Monte Carlo methods, which are based on independent samples. MCMC methods have the great advantage that they apply to a broader class of multivariate problems than methods based on independent sampling. Over the last - years, the approach has had a large impact on the theory and practice of statistical modeling. On the other hand, MCMC has had relatively little impact yet) on estimation problems in control. Background This article is a survey of popular implementations of MCMC, focusing particularly on the two most popular specific implementations of MCMC: Metropolis- Hastings M-H) and Gibbs sampling. Although the results presented have a rigorous basis, the presentation here is relatively informal in the hopes of conveying the main ideas relevant to implementation. Our aim is to provide the reader with some of the central motivation and the rudiments needed for a straightforward application. The cited references provide etensive detail not presented here, including the rigorous DIGITAL VISIO justification for many of the results. Although MCMC has general applicability, one area where it has had a revolutionary impact is arkov chain Monte Carlo MCMC) is a powerful means for generating random samples that can be used in problems for which Bayesian methods can be applied. Me- Bayesian analysis. MCMC has greatly epanded the range of computing statistical estimates and tropolis et al. [] introduced MCMC-type methods in the marginal and conditional probabilities. MCMC methods rely on depend- Geman and Geman [3], and Tanner and Wong [], the paper area of physics. Following key papers by Hastings [], ent Markov) sequences having a limiting distribution of Gelfand and Smith [] is largely credited with introducing corresponding to a distribution of interest. ote that the the applications of MCMC methods in modern statistics, use of the term Markov chain in MCMC is not entirely consistent with standard usage in stochastic processes. The Over the last decade, many papers and books have been specifically in Bayesian modeling. term is generally reserved for use with processes having discrete outcomes. For consistency with standard terminology istic problems in a wide variety of areas. Among the ecellent published displaying the power of MCMC in dealing with real- in the MCMC area, however, we follow suit in this article in review papers in this area are the survey by Besag et al. [] using the term Markov chain under the more general appli- and the vignettes by Cappé and Robert [7] and Gelfand []. The author james.spall@jhuapl.edu) is with the Johns Hopkins University, Applied Physics Laboratory, Johns Hopkins Rd., Laurel, MD 73-99, U.S.A. 7-7/3/$7. 3IEEE 3 IEEE Control Systems Magazine April 3 [ Arnaud Doucet and Xiaodong Wang] [A review in the statistical signal processing contet] IFIITY n many problems encountered in signal processing, it is possible to accurately describe the underlying statistical model using probability distributions. Statistical inference can then theoretically be performed based on the relevant likelihood function or posterior distribution in a Bayesian framework. However, most problems encountered in applied research require non-gaussian and/or nonlinear models to correctly account for the observed data. In these cases, it is typically impossible to obtain the required statistical estimates of interest [e.g., maimum likelihood ML) or conditional epectation] in closed form as it requires integration and/or maimization of comple multidimensional functions. A standard approach consists of making model simplifications or crude analytic approimations to obtain algorithms that can be easily implemented. With the recent availability of high-powered computers, numerical-simulation-based IEEE SIGAL PROCESSIG MAGAZIE [] OVEMBER 3-//$. IEEE Gibbs Sampling ) Optimization with MCMC ) Gibbs sampling is a special case of the Metropolis-Hastings algorithm where the proposal function is set to be the conditional distribution of the variables. It is especially useful when the dimension of the space to sample is very large e.g. images. Suppose, we are sampling in a two dimensional space = [, ] T. Then the Gibbs sampler works as follows. Gibbs Sampler for D Sample ) q ). For i =, 3,..., Sample i) p i ) ). Sample i) p i) ). Set i) = [ i), i) ]T. ote that due to the special proposal, a Gibbs sampler does not have an accept-reject step as M-H. Maimum a posteriori estimation requires ˆ MAP = arg ma p y) = arg ma p, y) The densities, in general, have multiple modes and local optima. MCMC methods can be used to find global optimum of such densities. For this purpose, a time-varying target density is selected in an MCMC iteration. References 3) A Few Concepts to Summarize Lecture ) M I Monte Carlo Methods for Signal Processing Spall, J.C., Estimation via Markov chain Monte Carlo, IEEE Control Systems Magazine, vol.3, no., pp. 3-, Apr stamp/stamp.jsp?tp=&arnumber=77&isnumber=9 Doucet, A.; Wang X., Monte Carlo methods for signal processing: a review in the statistical signal processing contet, IEEE Signal Processing Magazine, vol., no., pp. 7, ov.. stamp.jsp?tp=&arnumber=9&isnumber=33 Andrieu C.; De Freitas.; Doucet A.; Jordan M. I., An Introduction to MCMC for,, vol., pp. 3, 3. intromontecarlomachinelearning.pdf Robert C. P.; Casella G., Monte Carlo Statistical Methods, Springer,. Some density functions in this lecture came from this one! Gilks W. R.; Richardson S.; Spiegelhalter D. J., Markov Chain Monte Carlo, Chapman & Hall, 99. Bishop C., Pattern Recognition and, Springer,. MacKay D. J. C., Information Theory, Inference and Learning Algorithms, Cambridge University Press, 3. available online) Monte Carlo Methods: Approimate inference tools using the samples from the target densities. Basic Sampling Methods: The sampling methods to obtain independent samples from target densities. Though quite powerful, these would give bad results with high dimensions. MCMC: Monte Carlo methods which produce dependent samples but more robust in high dimensions. Metropolis-Hastings Algorithm: The most well-known MCMC algorithm using arbitrary proposal densities. Gibbs Sampler: A specific case of M-H algorithm which samples from conditionals iteratively and always accepts a new sample.
Outline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMONTE CARLO METHODS. Hedibert Freitas Lopes
MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationParticle Methods as Message Passing
Particle Methods as Message Passing Justin Dauwels RIKEN Brain Science Institute Hirosawa,2-1,Wako-shi,Saitama,Japan Email: justin@dauwels.com Sascha Korl Phonak AG CH-8712 Staefa, Switzerland Email: sascha.korl@phonak.ch
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationMcGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.
McGill University Department of Epidemiology and Biostatistics Bayesian Analysis for the Health Sciences Course EPIB-675 Lawrence Joseph Bayesian Analysis for the Health Sciences EPIB-675 3 credits Instructor:
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationBayesian Estimation with Sparse Grids
Bayesian Estimation with Sparse Grids Kenneth L. Judd and Thomas M. Mertens Institute on Computational Economics August 7, 27 / 48 Outline Introduction 2 Sparse grids Construction Integration with sparse
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationSampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.
Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationOutline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination
Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:
More informationParticle-Based Approximate Inference on Graphical Model
article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationSampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.
Sampling Methods Machine Learning orsten Möller Möller/Mori 1 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure defined
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive
More informationSampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo
Sampling Methods Bishop PRML h. 11 Alireza Ghane Sampling Methods A. Ghane /. Möller / G. Mori 1 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationOutline lecture 4 2(26)
Outline lecture 4 2(26), Lecture 4 eural etworks () and Introduction to Kernel Methods Thomas Schön Division of Automatic Control Linköping University Linköping, Sweden. Email: schon@isy.liu.se, Phone:
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More information