Cheng Soon Ong & Christian Walder. Canberra February June 2018
|
|
- Eunice Sharp
- 5 years ago
- Views:
Transcription
1 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1 Linear Regression 2 Linear Classification 1 Linear Classification 2 Kernel Methods Sparse Kernel Methods Mixture Models and EM 1 Mixture Models and EM 2 Neural Networks 1 Neural Networks 2 Principal Component Analysis Autoencoders Graphical Models 1 Graphical Models 2 Graphical Models 3 Sequential Data 1 Sequential Data 2 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 828
2 Part XVI from the from Standard Rejection Importance Gibbs 588of 828
3 Approximating π - Buffon s Needle (1777) Drop length c needle on parallel lines distance d apart Needle falls perpendicular: Probability of crossing the line is c/d. Needle falls at an arbitrary angle a: Probability of crossing the line c sin(a)/d. Every angle is equally probable. Calculate the mean: p(crossing) = c d π 0 sin(a) dp(a) = 1 π c d π (iv) n crossings in N experiments results in n N 2 π 0 sin(a) da = 2 π c d c d from the from Standard Rejection Importance Gibbs d c d a c sin a (i) Needle falls perpendicular (a = π/2). (ii) Needle falls at arbitrary angle a. 589of 828
4 For most probabilistic models of practical interest, exact inference is intractable. Need approximation. Numerical sampling (Monte methods). Fundamental problem : Find the expectation of some function f (z) w.r.t. a probability distribution p(z) E [f ] = f (z) p(z) dz Key idea : Draw z (l), l = 1...., L independent samples from p(z) and approximate the expectation by f = 1 L L f (z (l) ) l=1 from the from Standard Rejection Importance Gibbs Problem: How to obtain independent samples from p(z)? 590of 828
5 Approximating the Expectation of f (z) Samples must be independent, otherwise the effective sample size is much smaller than the apparent sample size. If f (z) is small in regions where p(z) is large (or vice versa) : need large sample sizes to catch contributions from all regions. from the from Standard p(z) f(z) Rejection Importance Gibbs z 591of 828
6 from the In a computer usually via pseudorandom number generator : an algorithm generating a sequence of numbers that approximates the properties of random numbers. Use a mathematically well crafted pseudorandom number generator. From now on we will assume that we have a good pseudorandom number generator for uniformly distributed data available. If you don t trust any algorithm : Three carefully adjusted radio receivers picking up atmospheric noise to provide real random numbers at from the from Standard Rejection Importance Gibbs 592of 828
7 from Standard Goal: Sample from p(y) which is given in analytical form. Suppose uniformly distributed samples of z in the interval (0, 1) are available. Calculate the cumulative distribution function h(y) = y p(x) dx Transform the samples from U(z 0, 1) by y = h 1 (z) to obtain samples y distributed according to p(y). Easy to check that h is then the CDF of y: from the from Standard Rejection Importance Gibbs p(y < x) = p(h 1 (z) < x) = p(z < h(x)) = h(x). 593of 828
8 from Standard Goal: Sample from p(y) which is given in analytical form. If a uniformly distributed random variable z is transformed using y = h 1 (z) then y will be distributed according to p(y). 1 h(y) from the from Standard Rejection p(y) Importance Gibbs 0 y 594of 828
9 from the Exponential Distribution Goal: Sample from the exponential distribution { λe λy 0 y p(y) = 0 y < 0 with rate parameter λ > 0. Suppose uniformly distributed samples of z in the interval (0, 1) are available. Calculate the cumulative distribution function h(y) = y p(x) dx = y Transform the samples from U(z 0, 1) by 0 λe λx dx = 1 e λy from the from Standard Rejection Importance Gibbs y = h 1 (z) = 1 ln(1 z) λ to obtain samples y distributed according to the exponential distribution. 595of 828
10 from Multivariate Generalisation to multiple variables is straightforward Consider change of variables via the Jacobian p(y 1,..., y M ) = p(z 1,..., z M ) (z 1,..., z M ) (y 1,..., y M ) Technical challenge: Multiple integrals; inverting nonlinear functions of multiple variables. from the from Standard Rejection Importance Gibbs 596of 828
11 the Gaussian Distribution - Box-Muller 1 Generate pairs of uniformly distributed random numbers z 1, z 2 ( 1, 1) (e.g. z i = 2z 1 for z from U(z 0, 1)) 2 Discard any pair (z 1, z 2 ) unless z z Results in a uniform distribution inside of the unit circle p(z 1, z 2 ) = 1/π. 3 Evaluate r 2 = z z2 2 and y 1 = z 1 ( 2 ln r 2 r 2 y 2 = z 2 ( 2 ln r 2 r 2 ) 1/2 ) 1/2 4 y 1 and y 2 are independent with joint distribution p(y 1, y 2 ) = p(z 1, z 2 ) (z 1, z 2 ) (y 1, y 2 ) = 1 e y2 1 /2 1 e y2 2 /2 2π 2π from the from Standard Rejection Importance Gibbs 597of 828
12 the Multivariate Gaussian Generate x N (0, I) let y = Ax + b E y = b E [(y E y) (y E y)] = A E [x x]a = A A from the from Standard Rejection Importance We know y is Gaussian (why?). So to sample y N (m, Σ) put b = m A A = Σ A is the Cholesky factor of Σ Gibbs 598of 828
13 Rejection Assumption 1 : directly from p(z) is difficult, but we can evaluate p(z) up to some unknown normalisation constant Z p p(z) = 1 Z p p(z) Assumption 2 : We can draw samples from a simpler distribution q(z) and for some constant k and all z holds kq(z) p(z) from the from Standard Rejection kq(z 0 ) kq(z) Importance Gibbs u 0 p(z) z 0 z 599of 828
14 Rejection 1 Generate a random number z 0 from the distribution q(z). 2 Generate a number from the u 0 from the uniform distribution over [0, k q(z 0 )]. 3 If u 0 > p(z 0 ) then reject the pair (z 0, u 0 ). 4 The remaining pairs have uniform distribution under the curve p(z). 5 The z values are distributed according to p(z). from the from Standard Rejection kq(z 0 ) kq(z) Importance Gibbs u 0 p(z) z 0 z 600of 828
15 Rejection - Example Consider the Gamma Distribution for a > 1 Gam(z a, b) = ba z a 1 exp( bz) Γ(a) Suitable q(z) could be like the Cauchy distribution q(z) = k 1 + (z c) 2 /b 2 Samples z from q(z) by using uniformly distributed y and transformation z = b tan y + c for c = a 1, b 2 = 2a 1 and k as small as possible for kq(z) p(z) from the from Standard Rejection Importance p(z) Gibbs z 601of 828
16 Suitable form for the proposal distribution q(z) might be difficult to find. If p(z) is log-concave (ln p(z) has nonincreasing derivatives), use the derivatives to construct an envelope. 1 Start with an initial grid of points z 1,..., z M and construct the envelope using the tangents at the p(z i ), i = 1,..., M). 2 Draw a sample from the envelop function and if accepted use it to calculate p(z). Otherwise, use it to refine the grid. ln p(z) from the from Standard Rejection Importance Gibbs z 1 z 2 z 3 z 602of 828
17 - Example The piecewise exponential distribution is defined as p(z) = k m λ m e λm(z zm 1) ẑ m 1,m < z ẑ m,m+1 where ẑ m 1,m is the point of intersection of the tangent lines at z m 1 and z m, λ m is the slope of the tangent at z m and k m accounts for the corresponding offset. ln p(z) from the from Standard Rejection Importance Gibbs z 1 z 2 z 3 z 603of 828
18 Rejection - Problems Need to find a proposal distribution q(z) which is a close upper bound to p(z); otherwise many samples are rejected. Curse of dimensionality for multivariate distributions. from the kq(z 0 ) kq(z) from Standard Rejection Importance u 0 p(z) Gibbs z 0 z 604of 828
19 Importance Directly calculate E p [f (z)] w.r.t. some p(z). Does not sample p(z) as an intermediate step. Again use samples z (l) from some proposal q(z). E [f ] = f (z) p(z) dz = f (z) p(z) q(z) dz q(z) 1 L p(z (l) ) L q(z (l) ) f (z(l) ) l=1 from the from Standard Rejection Importance Gibbs 605of 828
20 Importance - Unnormalised Consider unnormalised p(z) = p(z) Z p It follows then that, with r l = p(z(l) ) q(z (l) ), and q(z) = q(z) Z q. E [f ] = f (z) p(z) dz = Z q f (z) p(z) q(z) dz Z p q(z) Z q 1 L r l f (z (l) ). Z p L l=1 Using the same samples (and f (z) = 1) gives Z p 1 L r l Z q L E [f ] L l=1 l=1 r l L m=1 r f (z (l) ) m from the from Standard Rejection Importance Gibbs 606of 828
21 Importance - Key Points Importance weights r l correct the bias introduced by sampling from the proposal distribution q(z) instead of the wanted distribution p(z). Success depends on how well q(z) approximates p(z). If p(z) > 0 in same region, then q(z) > 0 necessary. from the from Standard Rejection Importance Gibbs 607of 828
22 Goal: sample from p(z). Generate a sequence using a Markov chain. 1 Generate a new sample z q(z z (l) ), conditional on the previous sample z (l). 2 Accept or reject the new sample according to some appropriate criterion. { z (l+1) z if accepted = z (l) if rejected 3 For an appropriate proposal and corresponding acceptance criterion, as l, z (l) approaches an independent sample of p(z). from the from Standard Rejection Importance Gibbs 608of 828
23 Metropolis Algorithm 1 Choose a symmetric proposal distribution q(z A z B ) = q(z B z A ). 2 Accept the new sample z with probability ( ) A(z, z (r) ) = min 1, p(z ) p(z (r) ) e.g., let u Uniform(0, 1) and accept if p(z ) p(z (r) ) > u. 3 Unlike rejection sampling we include the previous sample on rejection of the proposal: { z (l+1) z if accepted = z (r) if rejected from the from Standard Rejection Importance Gibbs 609of 828
24 Metropolis Algorithm - Illustration from a Gaussian Distribution (black contour shows one standard deviation). Proposal is q(z z (l) ) = N (z z (l), σ 2 I) with σ = candidates generated; 43 rejected from the from Standard Rejection Importance Gibbs accepted steps, rejected steps. 610of 828
25 - Why it works A First Order Markov Chain is a series of random variables z (1),..., z (M) such that the following property holds Marginal probability p(z (m+1) z (1),..., z (m) ) = p(z (m+1) z (m) ) p(z (m+1) ) = z (m) = z (m) p(z (m+1) z (m) ) p(z (m) ) T m (z (m) z (m+1) ) p(z (m) ) where T m (z (m) z (m+1) ) are the transition probabilities. from the from Standard Rejection Importance Gibbs z1 z2 zm zm+1 611of 828
26 - Why it works Marginal probability p(z (m+1) ) = z (m) T m (z (m) z (m+1) ) p(z (m) ) A Markov chain is called homogeneous if the transition probabilities are the same for all m, denoted by T(z, z). A distribution is invariant, or stationary, with respect to a Markov chain if each step leaves the distribution invariant. For a homogeneous Markov chain, the distribution p (z) is invariant if p (z) = z T(z, z) p (z ). from the from Standard Rejection Importance Gibbs (Note: There can be many. If T is the identity matrix, every distribution is invariant.) 612of 828
27 - Why it works Detailed balance p (z) T(z, z ) = p (z ) T(z, z). is sufficient (but not necessary) for p (z) to be invariant. (A Markov chain that respects the detailed balance is called reversible.) A Markov chain is ergodic if it converges to the invariant distribution irrespective of the choice of the initial conditions. The invariant distribution is then called equilibrum. An ergodic Markov chain can have only one equilibrium distribution. Why is it working? Choose the transition probabilities T to satisfy the detailed balance for our goal distribution p(z). from the from Standard Rejection Importance Gibbs 613of 828
28 - Metropolis-Hastings Generalisation of the Metropolis algorithm for nonsymmetric proposal distributions q k. At step τ, draw a sample z from the distribution q k (z z (τ) ) where k labels the set of possible transitions. Accept with probability ( A k (z, z (τ) p(z ) q k (z (τ) z ) ) ) = min 1, p(z (τ) ) q k (z z (τ) ) Choice of proposal distribution critical. Common choice : Gaussian centered on the current state. small variance high acceptance rate, but slow walk through the state space; samples not independent large variance high rejection rate from the from Standard Rejection Importance Gibbs 614of 828
29 - Metropolis-Hastings Transition probability of this Markov chain is T(z, z ) = q k (z z) A k (z, z) Prove that p(z) is the invariant distribution if the detailed balance holds p(z) T(z, z ) = T(z, z) p(z ). Using the symmetry min(a, b) = min(b, a) it can be shown that the detailed balance holds p(z) q k (z z) A k (z, z) = min (p(z) q k (z z), p(z ) q k (z z )) = min (p(z ) q k (z z ), p(z) q k (z z)) = p(z ) q k (z z ) A k (z, z ). from the from Standard Rejection Importance Gibbs 615of 828
30 - Metropolis-Hastings Isotropoic Gaussian proposal distribution (blue) In order to keep the rejection rate low, use the smallest standard deviation σ min of the multivariate Gaussian (red) for the proposal distribution. Leads to random walk behaviour slow exploration of the state space. Number of steps separating states that are approximately independent is (σ max /σ min ) 2. from the from Standard Rejection Importance σ max Gibbs ρ σ min 616of 828
31 Gibbs Goal: sample from a joint distribution p(z) = p(z 1,..., z M ) How? sample one variable from the distribution conditioned on all the other variable Example : given p(z 1, z 2, z 3 ) At step τ we have samples z (τ) 1, z (τ) 2 and z (τ) 3. Get samples for the next step τ + 1 z (τ+1) 1 p(z (τ) 1 z (τ) 2, z (τ) 3 ) z (τ+1) 2 p(z (τ) 2 z (τ+1) 1, z (τ) 3 ) z (τ+1) 3 p(z (τ) 3 z (τ+1) 1, z (τ+1) 2 ) from the from Standard Rejection Importance Gibbs 617of 828
32 Gibbs - Why does it work? 1 p(z) is invariant of each of the Gibbs sampling steps and hence of the whole Markov chain : a) p(z i {z \i }) is invariant because the marginal distribution p(z \i ) does not change, and b) by definition each step samples from p(z i {z \i }). 2 Ergodicity: sufficient that none of the conditional distribution is zero anywhere. 3 Gibbs sampling is a Metropolis-Hastings sampling in which each step is accepted. z 2 L from the from Standard Rejection Importance Gibbs l 618of 828
Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationMCMC and Gibbs Sampling. Sargur Srihari
MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.
Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationSampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo
Sampling Methods Bishop PRML h. 11 Alireza Ghane Sampling Methods A. Ghane /. Möller / G. Mori 1 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs
More informationSampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.
Sampling Methods Machine Learning orsten Möller Möller/Mori 1 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure defined
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional
More information2 Random Variable Generation
2 Random Variable Generation Most Monte Carlo computations require, as a starting point, a sequence of i.i.d. random variables with given marginal distribution. We describe here some of the basic methods
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, Brazil. Abstract The adaptive rejection sampling
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More information2 Functions of random variables
2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationMTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6
MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive
More informationIntroduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang
Introduction to MCMC DB Breakfast 09/30/2011 Guozhang Wang Motivation: Statistical Inference Joint Distribution Sleeps Well Playground Sunny Bike Ride Pleasant dinner Productive day Posterior Estimation
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More informationGenerating the Sample
STAT 80: Mathematical Statistics Monte Carlo Suppose you are given random variables X,..., X n whose joint density f (or distribution) is specified and a statistic T (X,..., X n ) whose distribution you
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationThe Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationIntelligent Systems I
1, Intelligent Systems I 12 SAMPLING METHODS THE LAST THING YOU SHOULD EVER TRY Philipp Hennig & Stefan Harmeling Max Planck Institute for Intelligent Systems 23. January 2014 Dptmt. of Empirical Inference
More informationMCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationToday: Fundamentals of Monte Carlo
Today: Fundamentals of Monte Carlo What is Monte Carlo? Named at Los Alamos in 1940 s after the casino. Any method which uses (pseudo)random numbers as an essential part of the algorithm. Stochastic -
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More information( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:
Monte Carlo Methods Interested in Examples: µ E f X = f x d x Type I error rate of a hypothesis test Mean width of a confidence interval procedure Evaluating a likelihood Finding posterior mean and variance
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationInverse Problems in the Bayesian Framework
Inverse Problems in the Bayesian Framework Daniela Calvetti Case Western Reserve University Cleveland, Ohio Raleigh, NC, July 2016 Bayes Formula Stochastic model: Two random variables X R n, B R m, where
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationMonte Carlo integration (naive Monte Carlo)
Monte Carlo integration (naive Monte Carlo) Example: Calculate π by MC integration [xpi] 1/21 INTEGER n total # of points INTEGER i INTEGER nu # of points in a circle REAL x,y coordinates of a point in
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationECE 302 Division 2 Exam 2 Solutions, 11/4/2009.
NAME: ECE 32 Division 2 Exam 2 Solutions, /4/29. You will be required to show your student ID during the exam. This is a closed-book exam. A formula sheet is provided. No calculators are allowed. Total
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More information