Learning the hyper-parameters. Luca Martino
|
|
- Robert Jackson
- 5 years ago
- Views:
Transcription
1 Learning the hyper-parameters Luca Martino / 28
2 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters For instance, do you recall λ (bandwidth of the kernel/basis) and σ e (std of the noise)? where e N (e; 0, σ e 2 ). ψ(x x n ) = exp ( x x n 2 ) 2λ 2, y = f (x) + e, 2 / 28
3 Cross Validation (CV) Split the dataset D = {x i, y i } N i=1 in two sets D train = {x (TR) i, y (TR) i } N TR i=1, and D test = {x (V ) i, y (V ) i } N V i=1, (or D validation ) so that D = D train D test. Then: 1. Given some values of the hyper-parameters θ = [λ, σ e ], compute the estimator f (x θ) using D train. 2. Validate how good is the solution f (x θ) using D test. For instance, we can try to minimize the MSE in prediction θ = arg min θ Θ N (V ) n=1 ( y (V ) n f (x (V ) θ)) 2. 3 / 28
4 Cross Validation (CV) Note that the previous procedure is equivalent to θ = arg max θ Θ exp N (V ) n=1 ( y n (V ) f ) 2 (x (V ) θ). However, we can also try to minimize or maximize other cost or pay-off functions. (not only the error in prediction)...or using other estimators θ, considering the mean of the median, instead of the maximum... 4 / 28
5 Other estimators for CV Denoting as p(y (V ) θ) exp N (V ) n=1 ( y n (V ) f ) 2 (x (V ) θ). the CV-Error in Prediction likelihood, and Denoting as p(θ) the prior over the hyper-parameters θ, and p(θ y (V ) ) the corresponding posterior, then we can also define other estimators, for instance, Minimum Mean Square Error (MMSE) estimator, θ MMSE = θp(θ y)dθ. (1) instead of using the maximum θ MAP. Θ 5 / 28
6 K-fold CV Split the dataset D = {x i, y i } N i=1 in K sets D(K). For k = 1,..., K : 1. Given some hyper-parameters θ = [λ, σ e ], and using D (k) as training set, compute the estimator f k (x θ). 2. Obtain θ (k) considering the rest of K 1 sets as validation sets. Finally, compute θ = 1 K K θ (k). k=1 6 / 28
7 Leave-one-Out and All-in Leave-One-Out : In this case, we consider exactly K = N sets each one formed by N 1 data and only one out. All-in : all for training...it is not CV (K = 1 with N data)... let see the marginal likelihood approach to clarify this point... 7 / 28
8 Alternative to the Error in Prediction: Marginal Likelihood Given the studied models, the marginal likelihood has the form (or similar) p(y θ) = N (y 0, Ψ + σ 2 ei N ), where λ affects the construction of Ψ!! (recall that θ = [λ, σ e ]). We can try to maximize the marginal likelihood, θ = arg max θ Θ p(y θ). It can be used with (inside) or without ( All-in ) CV... 8 / 28
9 Marginal Likelihood Recall that log[p(y θ)] = y (K + σ 2 I N ) 1 y log [ det(k + σ 2 I N ) ] + const. With a uniform prior density p(θ) = I(θ), the posterior density p(θ, y) p(y θ)p(θ) = p(y θ)i(θ), (2) where I(θ) = 1 if θ Θ, I(θ) = 0 otherwise, if θ / Θ. Maximum a Posteriori (MAP) estimator, θ MAP = arg max p(θ y), (3) Minimum Mean Square Error (MMSE) estimator, θ MMSE = θp(θ y)dθ. (4) Θ 9 / 28
10 Global View In general, the elements that must be analyzed/chosen are: 1. Different cost or pay-off functions (including Cross Validation (CV) and mini-batches approaches) 2. Different estimators (MAP, MMSE, median etc.) 3. Choice of the prior pdfs (in a Bayesian framework) 4. Computational algorithms (for approximating the estimators) Several possible combinations Different conclusions for different Machine Learning algorithms Compare methods: complexity, number of parameters/hyperparameters 10 / 28
11 SECOND PART: given a posterior, approximation of the estimators by MONTE CARLO 11 / 28
12 Inference using Monte Carlo Given a posterior π(θ) = p(θ y), we desire to obtain maximum, expected value (mean) (h(θ) = θ; see below), median, covariance matrix and other moments... such as I = h(θ) π(θ)dθ. but it cannot be done analytically, in general. It is impossible analytically: we will do it numerically. Deterministic methods fails in high dimensions, cannot be applied easily... Θ 12 / 28
13 Inference using Monte Carlo Let us consider that we are able to evaluate point-wise π(θ) = p(y θ)p(θ), then π(θ) = p(θ y) = 1 Z π(θ), where Z = Θ π(θ)dθ, is the marginal likelihood Z = p(y). 13 / 28
14 Monte Carlo approximation Our problem is to compute numerically integrals of type I = h(θ) π(θ)dθ, (5) Θ = 1 h(θ)π(θ)dθ. (6) Z Θ Monte Carlo approximation: I = h(θ) π(θ)dθ, (7) Θ 1 T T h(θ t ) (8) t=1 where θ t π(θ). 14 / 28
15 Monte Carlo approximation: Sampling methods Then the problem is to generate random vectors from π(θ). Sampling Methods: procedures to generate random vectors from a generic density. Sampling Methods: NO RELATED to Nyquist and Signal Processing sampling procedures... to sample from..., to draw from... mean to generate random vectors/numbers... Figure with other notation (θ = x = [x 1, x 2]) x x x x / 28
16 proposal density and target density Proposal density: q(θ), easy to sample from (we can draw easily random samples from q). Target density: the posterior π(θ). Sampling Method: converts samples from q(θ) to samples distributed according to π(θ). A sampling Method can be considered a filter, that filters random vectors/numbers distributed according to q(θ) and convert these random vectors into vectors distributed according to π(θ). 16 / 28
17 proposal density and target density (2) Samples from proposal q(θ) = Sampling Method = Samples from target π(θ) SAMPLING METHOD (Monte Carlo) / 28
18 Evaluating versus Sampling a density IMPORTANT!! it is mandatory to distinguish between: Evaluating a density (or a function): given an x, obtain the output y = π(x). Ex: z = ( ) 1 exp (x µ)2 2πσ 2 2σ 2. Sampling (or draw) from a density: generate vectors/numbers x according to π(x). Namely, if we generate several samples x, x... the histogram of these samples approximates the shape of π(x). Ex: x = randn(1,1). 18 / 28
19 Classification of sampling methods MAIN FAMILIES: Direct methods: based on random variable transformation. independent samples. (the best, almost) computational effort: lowest. applicability: low. Rejection sampling independent samples. (the best, almost) computational effort: higher (depending on the acceptance rate). applicability: wider of direct methods, but in general low. Importance sampling (IS) weighted samples. computational effort: low. applicability: always. Markov Chain Monte Carlo (MCMC) positive-correlated samples. computational effort: low. applicability: always. 19 / 28
20 Markov Chain Monte Carlo (MCMC) MCMC: we generate a Markov Chain that has the posterior density π(θ) as an invariant/stationary density. θ 0 θ 1 θ 2... θ t after a burn-in period (with length t b ), we have Problem: we do not know t b... θ t π(θ), for t t b. (we will use all the samples without discarding some of them, hoping that T is enough great...) 20 / 28
21 Metropolis-Hastings (MH) algorithm The Metropolis-Hastings (MH) sampler: 1. Choose θ For t = 1,..., T : 2.1 Generate θ q(θ θ t 1 ). 2.2 Set θ t = θ with probability α = min [ 1, π(θ )q(θ t 1 θ ) π(θ t 1 )q(θ θ t 1 ) ], otherwise set θ t = θ t 1 (with probability 1 α). 3. Outputs: {θ 1,..., θ T } 21 / 28
22 From MH to Gibbs In MH, we propose directly vectors/samples θ = [θ 1,..., θ L ] RL directly on the space with dimension L. There are also component-wise strategies that work component by component in order to construct a complete sample/vector θ = [θ 1,..., θ L ]. 22 / 28
23 Bidimensional Gibbs Sampling (L = 2) Consider π(θ 1, θ 2 ), and note that π 1 (θ 1 θ 2 ) π(θ 1, θ 2 ), π 2 (θ 2 θ 1 ) π(θ 2 θ 1 ). Assume that we are able to draw from the conditionals π 1 and π 2. (strong assumption) The Bidimensional Gibbs sampler: 1. Choose θ 0 = [θ 1,0, θ 2,0 ]. 2. For t = 1,..., T : 2.1 Draw θ 1,t π 1 (θ 1 θ 2,t 1 ). 2.2 Draw θ 2,t π 2 (θ 2 θ 1,t ). 2.3 Set θ t = [θ 1,t, θ 2,t ]. 3. Outputs: {θ 1,..., θ T } 23 / 28
24 Bidimensional Gibbs Sampling (L = 2) Figure with other notation (θ = x = [x 1, x 2 ]) x x 1 24 / 28
25 Bidimensional Gibbs Sampling (L = 2) Figure with other notation (θ = x = [x 1, x 2 ]) x 2 1 x x x 1 25 / 28
26 Gibbs Sampling Assume that we are able to draw from the full-conditionals π l, l = 1,..., L. (strong assumption) The Gibbs sampler: 1. Choose θ 0 = [θ 1,0, θ 2,0,..., θ l,0,..., θ L,0 ]. 2. For t = 1,..., T : 2.1 For l = 1,..., L: Draw θ l,t π l (θ l θ 1:l 1,t, θ l+1:l,t 1 ) 2.2 Set θ t = [θ 1,t, θ 2,t,..., θ l,t,..., θ L,t ]. 3. Outputs: {θ 1,..., θ T } 26 / 28
27 MH-within-Gibbs If we are not able to draw from the full-conditionals, what do we do? we use another MCMC inside the Gibbs sampler, e.g., a MH method inside Gibbs. The MH-within-Gibbs sampler: 1. Choose θ 0 = [θ 1,0, θ 2,0,..., θ l,0,..., θ L,0 ]. 2. For t = 1,..., T : 2.1 For l = 1,..., L: Draw θ l,t from π l (θ l θ 1:l 1,t, θ l+1:l,t 1 ) using a MH algorithm (for instance, other T steps of MH). 2.2 Set θ t = [θ 1,t, θ 2,t,..., θ l,t,..., θ L,t ]. 3. Outputs: {θ 1,..., θ T } 27 / 28
28 Questions? THANKS! References: [1] L. Martino, V. Elvira. Metropolis Sampling, Wiley StatsRef: Statistics Reference Online, arxiv: [2] L. Martino, V. Elvira, G. Camps-Valls, The Recycling Gibbs Sampler for Efficient Learning, (to appear) Digital Signal Processing, arxiv: , 28 / 28
Bayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationBayesian Estimation with Sparse Grids
Bayesian Estimation with Sparse Grids Kenneth L. Judd and Thomas M. Mertens Institute on Computational Economics August 7, 27 / 48 Outline Introduction 2 Sparse grids Construction Integration with sparse
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning Luca Martino, Victor Elvira, Gustau Camps-Valls Image Processing Laboratory, Universitat de València (Spain). Department of Signal Processing, Universidad
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationMarkov Chain Monte Carlo, Numerical Integration
Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMONTE CARLO METHODS. Hedibert Freitas Lopes
MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More informationBayesian Model Comparison:
Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationA = {(x, u) : 0 u f(x)},
Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationMonte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants
Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Faming Liang Texas A& University Sooyoung Cheon Korea University Spatial Model Introduction
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y
More informationRisk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods
Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods Konstantin Zuev Institute for Risk and Uncertainty University of Liverpool http://www.liv.ac.uk/risk-and-uncertainty/staff/k-zuev/
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationHierarchical Modeling for Spatial Data
Bayesian Spatial Modelling Spatial model specifications: P(y X, θ). Prior specifications: P(θ). Posterior inference of model parameters: P(θ y). Predictions at new locations: P(y 0 y). Model comparisons.
More informationNested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers
ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors RicardoS.Ehlers Laboratório de Estatística e Geoinformação- UFPR http://leg.ufpr.br/ ehlers ehlers@leg.ufpr.br II Workshop on Statistical
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationIntroduction to Bayesian Computation
Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 20, 2018 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 1 / 30 Bayesian computation
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationBayesian model selection in graphs by using BDgraph package
Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationCondensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.
Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationWho was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?
Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the
More informationStrong Lens Modeling (II): Statistical Methods
Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution
More informationBayesian Phylogenetics
Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born
More informationEco517 Fall 2013 C. Sims MCMC. October 8, 2013
Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More information