for Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim

Size: px
Start display at page:

Download "for Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim"

Transcription

1 Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule

2 Abstract Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by it unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to have such a long CPU time. We propose a new stochastic optimization algorithm, the so-called simulated stochastic approximation annealing algorithm. Under the framework of stochastic approximation Markov chain Monte Carlo, we show that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, e.g., a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural network training and protein-folding. The numerical results indicate that the new algorithm can significantly outperform simulated annealing and other competitors.

3 The problem The optimization problem can be simply stated as a minimization problem: where X is the domain of U(x). min U(x), x X Minimizing U(x) is equivalent to sampling from the Boltzmann distribution at a very small value (closing to 0) of τ. f τ (x) exp( U(x)/τ )

4 Simulated Annealing (Kirkpatrick et al., 1983) It simulates from a sequence of Boltzmann distributions, f τ1 (x), f τ2 (x),..., f τm (x), in a sequential manner, where the temperatures τ 1,..., τ m form a decreasing ladder τ 1 > τ 2 > > τ m = τ > 0 with τ 0 and τ 1 reasonably large such that most uphill Metropolis-Hastings (MH) moves at that level can be accepted.

5 Simulated Annealing: Algorithm 1. Initialize the simulation at temperature τ 1 and an arbitrary sample x 0 X. 2. At each temperature τ i, simulate the distribution f τi (x) for n i iterations using the MH sampler. Pass the final sample to the next lower temperature level as the initial sample.

6 Simulated Annealing: Difficulty The major difficulty with simulated annealing is in choosing the cooling schedule: Logarithmic cooling schedule O(1/log(t)): It ensures the simulation to converge to the global minima of U(x) with probability 1. However, it is so slow that no one can afford to have so long running time. Linear or geometrical cooling schedule: A linear or geometrical cooling schedule is commonly used, but, as shown in Holley et al. (1989), these schedules can no longer guarantee the global minima to be reached.

7 Stochastic Approximation Monte Carlo (SAMC) SAMC is a general purpose MCMC algorithm. To be precise, it is an adaptive MCMC algorithm and also a dynamic importance sampling algorithm. Its self-adjusting mechanism enables it to be immune to local traps. Let E 1,..., E m denote a partition of the sample space X, which are made according to the energy function as follows: E 1 = {x : U(x) u 1 }, E 2 = {x : u 1 < U(x) u 2 },..., E m = {x : u m < U(x) u m }, E m = {x : U(x) > u m }, (1) where u 1 < u 2 <... < u m are prespecified numbers. Let {γ t } be a positive, non-increasing sequence satisfying the condition γ t =, t=1 γt 2 <. t=1

8 Stochastic Approximation Monte Carlo: Algorithm 1. (Sampling) Simulate a sample X t+1 with a single MH update, which starts with X t and leaves the following distribution invariant: f θt,τ (x) m i=1 { exp where I ( ) is the indicator function. 2. (θ-updating) Set U(x)/τ θ (i) t } I (x E i ), (2) θ t+ 1 2 = θ t + γ t+1 H τt+1 (θ t, x t+1 ), (3) where H τt+1 (θ t, x t+1 ) = e t+1 π, e t+1 = (I(x t+1 E 1 ),..., I(x t+1 E m )), and π = (π 1,..., π m ). Obviously, it is difficult to mix over the domain X if the temperature τ is very low! In this case, only very few points will be sampled from each subregion.

9 Space Annealing SAMC (Liang, 2007) Suppose that the sample space has been partitioned as in (1) with u 1,..., u m arranged in an ascending order. Let κ(u) denote the index of the subregion that a sample x with energy u belongs to. For example, if x E j, then κ(u(x)) = j. Let X (t) denote the sample space at iteration t. Space annealing SAMC starts with X (1) = m i=1 E i, and then iteratively shrinks the sample space by setting X (t) = κ(u(t) min +ℵ) i=1 E i, (4) where u (t) min is the minimum energy value obtained by iteration t, and ℵ is a user specified parameter. A major shortcoming of this algorithm is that it tends to get trapped into local energy minima when ℵ is small and the proposal is relatively local.

10 SAA Algorithm Simulated Stochastic Approximation Annealing, or SAA in short, is a combination of simulated annealing and stochastic approximation. Let {M k, k = 0, 1,...} be a sequence of positive numbers increasingly diverging to infinity, which work as truncation bounds of {θ t}. Let σ t be a counter for the number of truncations up to iteration t, and σ 0 = 0. Let θ 0 be a fixed point in Θ. E 1,..., E m is the partition of the sample space. π = (π 1,..., π m) is the desired sampling distribution of the m subregions. {γ t } is a gain factor sequence. {τ t} is a temperature sequence.

11 SAA Algorithm 1. (Sampling) Simulate a sample X t+1 with a single MH update, which starts with X t and leaves the following distribution invariant: f θt,τ t+1 (x) m i=1 { exp where I ( ) is the indicator function. 2. (θ-updating) Set U(x)/τ t+1 θ (i) t } I (x E i ), (5) θ t+ 1 2 = θ t + γ t+1 H τt+1 (θ t, x t+1 ), (6) where H τt+1 (θ t, x t+1 ) = e t+1 π, e t+1 = (I(x t+1 E 1 ),..., I(x t+1 E m)), and π = (π 1,..., π m ). 3. (Truncation) If θ t+ 1 2 M σt, set θ t+1 = θ t+ 1 2 ; otherwise, set θ t+1 = θ 0 and σ t+1 = σ t + 1.

12 Features of SAA Self-adjusting mechanism: This distinguishes the SAA algorithm from simulated annealing. For simulated annealing, the change of the invariant distribution is solely determined by the temperature ladder. While for SAA, the change of the invariant distribution is determined by both the temperature ladder and the past samples. As a result, SAA can converge with a much faster cooling schedule. Sample space shrinkage: Compared to space annealing SAMC, SAA also shrinks its sample space with iterations but in a soft way: it gradually biases sampling toward local energy minima of each subregion through lowering the temperature with iterations. This strategy of sample space shrinkage reduces the risk of getting trapped into local energy minima. Convergence: SAA can achieve essentially the same convergence toward global energy minima as simulated annealing from the perspective of practical applications.

13 Formulation of SAA The SAA algorithm can be formulated as a SAMCMC algorithm with the goal of solving the integration equation h τ (θ) = H τ (θ, x)f θ,τ (x)dx = 0, (7) where f θ,τ (x) denotes a density function dependent on θ and the limiting temperature τ s, and h is called the mean field function. SAA works through solving a system of equations defined along the temperature sequence {τ t }: h τt (θ) = H τt (θ, x)f θ,τt (x)dx = 0, t = 1, 2,..., (8) where f θ,τt (x) is a density function dependent on θ and the temperature τ t.

14 Conditions on mean filed function For SAA, the mean field function is given by h τ (θ) = H τ (θ, x)f θ,τ (x)dx = ( S τ (1) (θ) S τ (θ) π 1,..., S(m) τ (θ) S τ (θ) π m ), (9) for any fixed value of θ Θ and τ T, where S τ (i) (θ) = E e U(x)/τ dx/e θ(i), and i S τ (θ) = m i=1 S(i) τ (θ). Further, we define ( m v τ (θ) = 1 2 i=1 S τ (i) ) 2 (θ) S τ (θ) π i, (10) which is the so-called Lyapunov function in the literature of stochastic approximation. Then it is easy to verify that SAA satisfies the stability condition.

15 Stability Condition: (A 1 ) The function h τ (θ) is bounded and continuously differentiable with respect to both θ and τ, and there exists a non-negative, upper bounded, and continuously differentiable function v τ (θ) such that for any > δ > 0, sup T θ vτ (θ)hτ (θ) < 0, (11) δ d((θ,τ),l) where L = {(θ, τ) : h τ (θ) = 0, θ Θ, τ T } is the zero set of h τ (θ), and d(z, S) = inf y { z y : y S}. Further, the set v(l) = {v τ (θ) : (θ, τ) L} is nowhere dense.

16 Conditions on observation noise Observation noise: ξ t+1 = H τt+1 (θ t, x t+1 ) h τt+1 (θ t ). One can directly impose some conditions on observation noise, see e.g., Kushner and Clark (1978), Kulkarni and Horn (1995), and Chen (2002). These conditions are usually very weak, but difficult to verify. Alternatively, one can impose some conditions on the Markov transition kernel, which can lead to required conditions on the observation noise.

17 Doeblin condition: (A 2 ) (A 2 ) (Doeblin condition) For any given θ Θ and τ T, the Markov transition kernel P θ,τ is irreducible and aperiodic. In addition, there exist an integer l, 0 < δ < 1, and a probability measure ν such that for any compact subset K Θ, inf θ K,τ T Pl θ,τ (x, A) δν(a), x X, A B X, where B X denotes the Borel set of X ; that is, the whole support X is a small set for each kernel P θ,τ, θ K and τ T. Uniform ergodicity is slightly stronger than V -uniform ergodicity, but it just serves right for the SAA as for which the function H τ (θ, X ) is bounded, and thus the mean field function and observation noise are bounded. If the drift function V (x) 1, then V -uniform ergodicity is reduced to uniform ergodicity.

18 Doeblin condition To verify (A 2 ), one may assume that X is compact, U(x) is bounded in X, and the proposal distribution q(x, y) satisfies the local positive condition: (Q) There exists δ q > 0 and ϵ q > 0 such that, for every x X, x y δ q q(x, y) ϵ q.

19 Conditions on {γ t } and {τ t }: (A 3 ) (i) The sequence {γ t} is positive, non-increasing and satisfies the following conditions: γ t =, t=1 γ t+1 γ t γ t = O(γ ι t+1 ), t=1 γ (1+ι )/2 t t <, (12) for some ι [1, 2) and ι (0, 1). (ii) The sequence {τ t } is positive and non-increasing and satisfies the following conditions: lim τt = τ, t τt for some ι (0, 1), and τ t+1 = o(γ t), γ t τ t τ t ι <, (13) t=1 γ t τ t τ <, (14) t=1

20 Conditions on {γ t } and {τ t } For the sequences {γ t } and {τ t }, one can typically set γ t = C 1 t ς, τ t = C 2 t + τ, (15) for some constants C 1 > 0, C 2 > 0, and ς (0.5, 1]. Then it is easy to verify that (15) satisfies (A 3 ).

21 Convergence Theorem Theorem 1. Assume that T is compact and the conditions (A 1 )-(A 3 ) holds. If θ 0 used in the SAA algorithm is such that sup τ T v τ ( θ 0 ) < inf θ =c0,τ T v τ (θ) for some c 0 > 0 and θ 0 < c 0, then the number of truncations in SAA is almost surely finite; that is, {θ t } remains in a compact subset of Θ almost surely.

22 Convergence Theorem Theorem 2. Assume the conditions of Theorem 1 hold. Then, as t, d(θ t, L τ ) 0, a.s., where L τ = {θ Θ : h τ (θ) = 0} and d(z, S) = inf y { z y : y S}. That is, θ (i) t { C + log( E f τ (x)dx) log(π i i + π e), if E i,, if E i =, where C is a constant, and π e = j:e j = π j /(m m0), and m 0 is the number of empty subregions.

23 Strong Law of Large Numbers(SLLN) Theorem 3. Assume the conditions of Theorem 1 hold. Let x 1,..., x n denote a set of samples simulated by SAA in n iterations. Let g: X R be a measurable function such that it is bounded and integrable with respect to f θ,τ (x). Then 1 n n k=1 g(x k ) g(x)f θ,τ (x)dx, X a.s.

24 Convergence to Global Minima Corollary. Assume the conditions of Theorem 1 hold. Let x 1,..., x t denote a set of samples simulated by SAA in t iterations. Then, for any ϵ > 0, as t, 1 t k=1 I (J(x k) = i) t I (U(x k ) ui +ϵ & J(x k ) = i) {x:u(x) u i +ϵ} E e U(x)/τ dx i E e U(x)/τ, a.s dx i k=1 (16) for i = 1,..., m, where I ( ) denotes an indicator function. Moreover, if τ goes to 0, then P ( U(X t ) ui + ϵ J(X t ) = i ) 1, i = 1,..., m. (17) For simulated annealing, as shown in Haario and Saksman (1991), it can achieve the following convergence with a logarithmic cooling schedule: For any ϵ > 0, P(U(X t) u1 + ϵ) 1, a.s., (18) as t.

25 Comparison with Simulated Annealing Simulated annealing can achieve a stronger convergence mode than SAA. As a trade-off, SAA can work with a cooling schedule in which the temperature decreases much faster than in the logarithmic cooling schedule, such as the square-root cooling schedule. From the perspective of practical applications, (17) and (18) are almost equivalent: Both allows one to identify a sequence of samples that converge to the global energy minima of U(x). In practice, SAA can often work better than simulated annealing. This is because SAA possesses the self-adjusting mechanism, which enables SAA to be immune to local traps.

26 A 10-state Distribution The unnormalized mass function of the 10-state distribution. x P(x) The sample space X = {1, 2,..., 10} was partitioned according to the mass function into five subregions: E 1 = {8}, E 2 = {2, 5}, E 3 = {6, 9}, E 4 = {3} and E 5 = {1, 4, 7, 10}.

27 A 10-state Distribution Convergence of θ t for the 10-state distribution: the true value θ n is calculated at the end temperature , ˆθ n is the average of θ n over 5 independent runs, s.d. is the standard deviation of ˆθ n, and freq is the averaged relative sampling frequency of each subregion. The standard deviation of freq is nearly 0. Subregion E 1 E 2 E 3 E 4 E 5 θ n ˆθ n s.d freq 20.29% 20.23% 20.05% 19.84% 19.6%

28 A 10-state Distribution state iteration A thinned sample path of SAA for the 10-state distribution.

29 A function with multiple local minima Consider minimizing the function U(x) = {x 1 sin(20x 2 ) + x 2 sin(20x 1 )} 2 cosh{sin(10x 1 )x 1 } {x 1 cos(10x 2 ) x 2 sin(10x 1 )} 2 cosh{cos(20x 2 )x 2 }, where x = (x 1, x 2 ) [.1, 1.1] 2.

30 A function with multiple local minima Comparison of SAA and simulated annealing for the multi-modal example: Average of Minimum Energy Values a SAA ( ) ( ) ( ) ( ) ( ) SA d (sr) ( ) ( ) ( ) ( ) ( ) SA e (geo) ( ) ( ) ( ) ( ) ( )

31 A function with multiple local minima (a) Contour (b) SAA x x O O x x1 (c) SA (square root) (d) SA (geometric) x O O x O O x1 (a) Contour of U(x), (b) sample path of SAA, (c) sample path of simulated annealing with a square-root cooling schedule, and (d) sample path of simulated annealing with a geometric cooling schedule. The white circles show the global minima of U(x). x1

32 Feed-forward Neural Networks I 4 H 3 I 3 H 2 O Output Layer I 2 B H 1 Hidden Layer I 1 Input Layer A fully connected one hidden layer MLP network with four input units (I 1, I 2, I 3, I 4 ), one bias unit (B), three hidden units (H 1, H 2, H 3 ), and one output unit (O). The arrows show the direction of data feeding.

33 Two spiral Problem The two-spiral problem is to learn a feedforward neural network that distinguishes between points on two intertwined spirals. This is a benchmark feedforward neural network training problem. The objective function is high-dimensional, highly nonlinear, and consists of a multitude of local energy minima separated by high energy barriers.

34 Two spiral Problem y (a) y (b) x Classification maps learned by SAA with a MLE of 30 hidden units. The black and white points show the training data for two intertwined spirals. (a) Classification map learned in one run of SAA. (b) Classification map averaged over 20 runs. This figure shows the success of SAA in optimization of complex functions. x

35 Two spiral Problem Comparison of SAA, space annealing SAMC, simulated annealing, and BFGS for the two-spiral example. Notation: v i denotes the minimum energy value obtained in the ith run for i = 1,..., 20, Mean is the average of v i, SD is the standard deviation of mean, minimum =min 20 i=1 v i, maximum =max 20 i=1 v i, proportion = #{i : v i 0.21}, Iteration is the average number of iterations performed in each run. SA-1 employs the linear cooling schedule, and SA-2 employs the geometric cooling schedule with a decreasing rate of Algorithm Mean SD Min Max Prop Iter( 10 6 ) SAA Space annealing SAMC Simulated annealing simulated annealing BFGS

36 Protein Folding The AB model consists of only two types of monomers, A and B, which behave as hydrophobic (σ i = +1) and hydrophilic (σ i = ) monomers, respectively. The monomers are linked by rigid bonds of unit length to form linear chains living in two or three dimensional space. For the 2D case, the energy function consists of two types of contributions, bond angle and Lennard-Jones, and is given by N N 1 U(x) = 4 (1 cos x N i,i+1) + 4 i=1 i=1 j=i+2 [ r 2 ij ] C 2 (σ i, σ j )r 6 ij, (19) where x = (x 1,2,..., x N,N ), x i,j [ π, π] is the angle between the ith and jth bond vectors, and r ij is the distance between monomers i and j. The constant C 2 (σ i, σ j ) is +1, + 1 2, and 1 for AA, BB, and AB pairs, respectively. 2

37 Protein Folding Comparison of SAA and simulated annealing for the 2D-AB models. a The minimum energy value obtained by SAA (subject to a post conjugate gradient minimization procedure starting from the best configurations found in each run). b The averaged minimum energy value sampled by the algorithm and the standard deviation of the average. c The minimum energy value sampled by the algorithm in all runs. SAA Simulated Annealing N Post a Average b Best c Average b Best c (0.0011) (0.0018) (0.0020) (0.0463) (0.0555) (0.1260)

38 Protein Folding (a) (b) (c) Minimum energy configurations produced by SAA (subject to post conjugate gradient optimization) for (a) the 13-mer sequence with energy value , (b) the 21-mer sequence with energy value , and (c) the 34-mer sequence with energy The solid and open circles indicate the hydrophobic and hydrophilic monomers, respectively.

39 Summary We have developed the SAA algorithm for global optimization. Under the framework of stochastic approximation, we show that SAA can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, e.g., a square-root cooling schedule, while guaranteeing the global energy minima to be reached when the temperature tends to 0. Compared to simulated annealing, an added advantage of SAA is its self-adjusting mechanism that enables it to be immune to local traps. Compared to space annealing SAMC, SAA shrinks its sample space in a soft way, gradually biasing the sampling toward the local energy minima of each subregion through lowering the temperature with iterations. This strategy of sample space shrinkage reduces the risk for SAA to get trapped into local energy minima. SAA provides a more general framework of stochastic approximation than the current stochastic approximation MCMC algorithms. By including an additional control parameter τ t, stochastic approximation may find new applications or improve its performance in existing applications.

40 Acknowledgments Collaborators: Yichen Cheng and Guang Lin. NSF grants KAUST grant

Stochastic Approximation Monte Carlo and Its Applications

Stochastic Approximation Monte Carlo and Its Applications Stochastic Approximation Monte Carlo and Its Applications Faming Liang Department of Statistics Texas A&M University 1. Liang, F., Liu, C. and Carroll, R.J. (2007) Stochastic approximation in Monte Carlo

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

More information

Simulated Annealing for Constrained Global Optimization

Simulated Annealing for Constrained Global Optimization Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Gradient-based Adaptive Stochastic Search

Gradient-based Adaptive Stochastic Search 1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction

More information

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Simulation - Lectures - Part III Markov chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo Simulation - Lectures - Part III Markov chain Monte Carlo Julien Berestycki Part A Simulation and Statistical Programming Hilary Term 2018 Part A Simulation. HT 2018. J. Berestycki. 1 / 50 Outline Markov

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Markov Chain Monte Carlo. Simulated Annealing.

Markov Chain Monte Carlo. Simulated Annealing. Aula 10. Simulated Annealing. 0 Markov Chain Monte Carlo. Simulated Annealing. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [RC] Stochastic search. General iterative formula for optimizing

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Mollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be

Mollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be Mollifying Networks Caglar Gulcehre 1 Marcin Moczulski 2 Francesco Visin 3 Yoshua Bengio 1 1 University of Montreal, 2 University of Oxford, 3 Politecnico di Milano ICLR,2017 Presenter: Arshdeep Sekhon

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Likelihood Inference for Lattice Spatial Processes

Likelihood Inference for Lattice Spatial Processes Likelihood Inference for Lattice Spatial Processes Donghoh Kim November 30, 2004 Donghoh Kim 1/24 Go to 1234567891011121314151617 FULL Lattice Processes Model : The Ising Model (1925), The Potts Model

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Stochastic Approximation in Monte Carlo Computation

Stochastic Approximation in Monte Carlo Computation Stochastic Approximation in Monte Carlo Computation Faming Liang, Chuanhai Liu and Raymond J. Carroll 1 June 22, 2006 Abstract The Wang-Landau algorithm is an adaptive Markov chain Monte Carlo algorithm

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods p. /36 Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Markov Chain Monte Carlo Methods p. 2/36 Markov Chains

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Stochastic Networks Variations of the Hopfield model

Stochastic Networks Variations of the Hopfield model 4 Stochastic Networks 4. Variations of the Hopfield model In the previous chapter we showed that Hopfield networks can be used to provide solutions to combinatorial problems that can be expressed as the

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Stochastic Approximation in Monte Carlo Computation

Stochastic Approximation in Monte Carlo Computation Stochastic Approximation in Monte Carlo Computation Faming Liang, Chuanhai Liu and Raymond J. Carroll 1 June 26, 2006 Abstract The Wang-Landau algorithm is an adaptive Markov chain Monte Carlo algorithm

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold

More information

Sampling multimodal densities in high dimensional sampling space

Sampling multimodal densities in high dimensional sampling space Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS & Telecom ParisTech Paris, France Journées MAS Toulouse, Août 4 Introduction Sample from a target distribution

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

You submitted this quiz on Wed 16 Apr :18 PM IST. You got a score of 5.00 out of 5.00.

You submitted this quiz on Wed 16 Apr :18 PM IST. You got a score of 5.00 out of 5.00. Feedback IX. Neural Networks: Learning Help You submitted this quiz on Wed 16 Apr 2014 10:18 PM IST. You got a score of 5.00 out of 5.00. Question 1 You are training a three layer neural network and would

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Propp-Wilson Algorithm (and sampling the Ising model)

Propp-Wilson Algorithm (and sampling the Ising model) Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Monte Carlo Methods. Geoff Gordon February 9, 2006

Monte Carlo Methods. Geoff Gordon February 9, 2006 Monte Carlo Methods Geoff Gordon ggordon@cs.cmu.edu February 9, 2006 Numerical integration problem 5 4 3 f(x,y) 2 1 1 0 0.5 0 X 0.5 1 1 0.8 0.6 0.4 Y 0.2 0 0.2 0.4 0.6 0.8 1 x X f(x)dx Used for: function

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Walsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1

Walsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1 Walsh Diffusions Andrey Sarantsev University of California, Santa Barbara March 27, 2017 Andrey Sarantsev University of Washington, Seattle 1 / 1 Walsh Brownian Motion on R d Spinning measure µ: probability

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Aapo Hyvärinen Gatsby Unit University College London Part III: Estimation of unnormalized models Often,

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

RANDOM TOPICS. stochastic gradient descent & Monte Carlo

RANDOM TOPICS. stochastic gradient descent & Monte Carlo RANDOM TOPICS stochastic gradient descent & Monte Carlo MASSIVE MODEL FITTING nx minimize f(x) = 1 n i=1 f i (x) Big! (over 100K) minimize 1 least squares 2 kax bk2 = X i 1 2 (a ix b i ) 2 minimize 1 SVM

More information

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at MSc MT15. Further Statistical Methods: MCMC Lecture 5-6: Markov chains; Metropolis Hastings MCMC Notes and Practicals available at www.stats.ox.ac.uk\ nicholls\mscmcmc15 Markov chain Monte Carlo Methods

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria 12. LOCAL SEARCH gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley h ttp://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Annealing Between Distributions by Averaging Moments

Annealing Between Distributions by Averaging Moments Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Computer Practical: Metropolis-Hastings-based MCMC

Computer Practical: Metropolis-Hastings-based MCMC Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov

More information

Estimating Unnormalized models. Without Numerical Integration

Estimating Unnormalized models. Without Numerical Integration Estimating Unnormalized Models Without Numerical Integration Dept of Computer Science University of Helsinki, Finland with Michael Gutmann Problem: Estimation of unnormalized models We want to estimate

More information

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Lecture 26: Neural Nets

Lecture 26: Neural Nets Lecture 26: Neural Nets ECE 417: Multimedia Signal Processing Mark Hasegawa-Johnson University of Illinois 11/30/2017 1 Intro 2 Knowledge-Based Design 3 Error Metric 4 Gradient Descent 5 Simulated Annealing

More information