Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques

Similar documents
Lecture 34 Minimization and maximization of functions

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Lecture 6: Markov Chain Monte Carlo

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Doing Bayesian Integrals

17 : Markov Chain Monte Carlo

An introduction to Markov Chain Monte Carlo techniques

Multimodal Nested Sampling

Markov chain Monte Carlo

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

CSC 2541: Bayesian Methods for Machine Learning

Lecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods.

Markov chain Monte Carlo

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Introduction to Machine Learning CMU-10701

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

MCMC Sampling for Bayesian Inference using L1-type Priors

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC)

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Approximate inference in Energy-Based Models

Markov chain Monte Carlo Lecture 9

6. Advanced Numerical Methods. Monte Carlo Methods

Bayesian Methods for Machine Learning

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Introduction to Optimization

A Beginner s Guide to MCMC

CPSC 540: Machine Learning

Session 3A: Markov chain Monte Carlo (MCMC)

Markov Chain Monte Carlo methods

Chapter 10. Optimization Simulated annealing

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Numerical Optimization: Basic Concepts and Algorithms

16 : Markov Chain Monte Carlo (MCMC)

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Quantifying Uncertainty

STA 4273H: Statistical Machine Learning

Root Finding and Optimization

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Markov Networks.

Lecture 8: Policy Gradient

Nonparametric Bayesian Methods (Gaussian Processes)

CS 343: Artificial Intelligence

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Doing Physics with Random Numbers

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Exploring the energy landscape

Brief introduction to Markov Chain Monte Carlo

The Metropolis-Hastings Algorithm. June 8, 2012

Introduction to Optimization

1 The best of all possible worlds

Markov chain Monte Carlo

Computational statistics

A.I.: Beyond Classical Search

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Local Search & Optimization

Bayesian Inference and MCMC

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

CSC 446 Notes: Lecture 13

Monte Carlo methods for sampling-based Stochastic Optimization

Local Search & Optimization

Lecture 6: Monte-Carlo methods

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Monte Carlo Inference Methods

Statistics and Prediction. Tom

Lecture 8 HASHING!!!!!

LECTURE 15 Markov chain Monte Carlo

Advanced Statistical Modelling

4452 Mathematical Modeling Lecture 16: Markov Processes

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

The Not-Formula Book for C1

Report due date. Please note: report has to be handed in by Monday, May 16 noon.

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

CONTENTS. Preface List of Symbols and Notation

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Chapter Usual types of questions Tips What can go ugly. and, common denominator will be

MCMC Methods: Gibbs and Metropolis

Markov Chain Monte Carlo, Numerical Integration

Paul Karapanagiotidis ECO4060

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

The First Derivative Test

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

Markov chain Monte Carlo methods in atmospheric remote sensing

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

9 Markov chain Monte Carlo integration. MCMC

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Transcription:

Practical Numerical Methods in Physics and Astronomy Lecture 5 Optimisation and Search Techniques Pat Scott Department of Physics, McGill University January 30, 2013 Slides available from http://www.physics.mcgill.ca/ patscott

Outline 1 General Considerations 2

Outline 1 General Considerations 2

General Considerations Optimisation - the problem Optimisation is finding global minima and maxima for what x = ~xneedle does min[fhaystack (~x )] = f (xneedle )? maximisation is just minimisation of fhaystack (~x )

General Considerations Optimisation - the problem Optimisation is finding global minima and maxima for what x = ~xneedle does min[fhaystack (~x )] = f (xneedle )? maximisation is just minimisation of fhaystack (~x ) usually everything is posed as minimisation

The general strategy To optimise, we always require an objective or fitness function f haystack Any search problem can be posed in terms of some sort of fitness function We may care just about finding x needle or about mapping f haystack in the region of x needle e.g. comparing theory to data - just best fit parameters? - or errors on best-fit? - or just a good overall map, without even finding the exact best fit? goal is global minimum, but often result is local minima

Optimisation vs root finding multi-d optimisation usually easier than multi-d root finding optimisation by root finding on f haystack = 0 doesn t work - makes all local minima and maxima (and pts of inflection!) degenerate - highly unlikely to find the global extremum root finding for h( x) = 0 by minimisation of h 2 ( x) is not enough - generally run into problems with local minima - can be improved by combination with Newton s method in multi-d

Options: deterministic, non-gradient methods - Brent s method in 1D - downhill simplex in multi-d deterministic, gradient-based methods - steepest descent stochastic, gradient-inspired methods - MCMCs - nested sampling - simulated annealing stochastic, non-gradient methods - genetic algorithms - differential evolution many others...

Options: deterministic, non-gradient methods - Brent s method in 1D - downhill simplex in multi-d deterministic, gradient-based methods - steepest descent stochastic, gradient-inspired methods - MCMCs - nested sampling - simulated annealing stochastic, non-gradient methods - genetic algorithms - differential evolution many others...

Options: deterministic, non-gradient methods - Brent s method in 1D - downhill simplex in multi-d deterministic, gradient-based methods - steepest descent stochastic, gradient-inspired methods - MCMCs - nested sampling - simulated annealing stochastic, non-gradient methods - genetic algorithms - differential evolution many others...

Outline General Considerations 1 General Considerations 2

Outline General Considerations 1 General Considerations 2

Synopsis: Bracket the minimum with 3 points and use Brent s usual tricks parabola through 1 2 3 parabola through 1 2 4 3 1 2 5 4 Tracks 6 individual pts always 2 brackets and a third pt lower than brackets quadratic interpolation + bisection similar point ID shuffling to root-finding version similar conditions for interpolation step can be improved with derivative information

Outline General Considerations 1 General Considerations 2

Synopsis: Follow the gradient until you hit a local (line) minimum; reassess. Always need to hang a 90 left or right Works (for local min), but inefficient Requires 1D minimisation routine (e.g. Brent s)

Variants on steepest descent General idea of line minimisation can be improved Improvement from better directional basis set direction set methods - many ways to choose basis - goal is to choose directions s.t. successive line minimisations don t interfere Still uses 1D minimisation along a line requires Brent s or similar

Outline General Considerations 1 General Considerations 2

The downhill simplex method Synopsis: Ooze down the slope and around corners like a blob of goo (or an amoeba) short, simple, fun, effective any dimension no brackets, derivatives or line minimisation required still only good for local minima

1 Evaluate f ( x) at corners 2 Find worst-fit corner high low simplex at beginning of step

1 Evaluate f ( x) at corners 2 Find worst-fit corner 3 Replace worst-fit corner with new point reflected across remaining points high simplex at beginning of step low Let's try this! reflection

1 Evaluate f ( x) at corners 2 Find worst-fit corner 3 Replace worst-fit corner with new point reflected across remaining points 4 If new point is awesome*, extend simplex as well high That was awesome! simplex at beginning of step low Let's try this! reflection reflection and expansion *awesome = better than best-fit NEXT STEP

1 Evaluate f ( x) at corners 2 Find worst-fit corner 3 Replace worst-fit corner with new point reflected across remaining points 4 If new point is awesome*, extend simplex as well high That was awesome! simplex at beginning of step low Let's try this! reflection reflection and expansion That was OK. *awesome = better than best-fit terrible worse than second-worst fit OK = in between NEXT STEP

1 Evaluate f ( x) at corners 2 Find worst-fit corner 3 Replace worst-fit corner with new point reflected across remaining points 4 If new point is awesome*, extend simplex as well 5 If new point is terrible, discard it and try 1D contraction instead simplex at beginning of step high low Let's try this! reflection That was crap. That was awesome! contraction reflection and expansion That was OK. *awesome = better than best-fit terrible worse than second-worst fit OK = in between NEXT STEP

1 Evaluate f ( x) at corners 2 Find worst-fit corner 3 Replace worst-fit corner with new point reflected across remaining points 4 If new point is awesome*, extend simplex as well 5 If new point is terrible, discard it and try 1D contraction instead 6 If 1D contraction is also terrible, do multi-d contraction about best corner *awesome = better than best-fit terrible worse than second-worst fit OK = in between high That was awesome! reflection and expansion simplex at beginning of step low Let's try this! NEXT STEP reflection That was OK. That was crap. contraction That was crap too. multiple contraction

Outline General Considerations 1 General Considerations 2

s Synopsis: Jump around like a particle diffusing down a gradient

s Synopsis: Jump around like a particle diffusing down a gradient Biased random walk Trotta s example: like an elephant on the savannah looking for water

s Synopsis: Jump around like a particle diffusing down a gradient Biased random walk Trotta s example: like an elephant on the savannah looking for water - wanders randomly until it finds a few puddles

s Synopsis: Jump around like a particle diffusing down a gradient Biased random walk Trotta s example: like an elephant on the savannah looking for water - wanders randomly until it finds a few puddles - moves generally and stochastically around the surrounding area until it sights a bigger puddle

s Synopsis: Jump around like a particle diffusing down a gradient Biased random walk Trotta s example: like an elephant on the savannah looking for water - wanders randomly until it finds a few puddles - moves generally and stochastically around the surrounding area until it sights a bigger puddle - in doing so, moves on average in the direction n puddles -... until it finds a stream...

s Synopsis: Jump around like a particle diffusing down a gradient Biased random walk Trotta s example: like an elephant on the savannah looking for water - wanders randomly until it finds a few puddles - moves generally and stochastically around the surrounding area until it sights a bigger puddle - in doing so, moves on average in the direction n puddles -... until it finds a stream... - follows stream to jackpot

Definition 1 Monte Carlo: direct simulation of some stochastic process by drawing repeated samples from a known distribution Definition 2 Markov Chain: a string of system states / samples where each state depends only on the previous one Definition 3 : a Monte Carlo sampling from a distribution where each new sample is drawn with some reference to the last

Metropolis-Hastings sampling One particular sampling scheme for generating Markov Chains Best known, has nice statistical properties (more later) Randomly generate a new proposed point x maybe Test if f haystack ( x maybe ) < f haystack ( x current ) If so, x maybe x new If not, choose x maybe x new with probability f haystack ( x current )/f haystack ( x maybe )... and x current x new with probability [1 f haystack ( x current )/f haystack ( x maybe )]

Proposal functions General Considerations Q How do you generate the proposed point?

Proposal functions General Considerations Q How do you generate the proposed point? A You need a proposal function P( x) Generally some local distribution centred on the current point

Proposal functions General Considerations Q How do you generate the proposed point? A You need a proposal function P( x) Generally some local distribution centred on the current point e.g. a product of Gaussians in every direction

Proposal functions General Considerations Q How do you generate the proposed point? A You need a proposal function P( x) Generally some local distribution centred on the current point e.g. a product of Gaussians in every direction or a multi-d Gaussian pdf (not the same thing!!)

Proposal functions & burn-in Ideally, P = f haystack in vicinity of x current not usually practical P should be chosen adaptively to get the best approximation to f haystack - e.g. by analysing previous points and adjusting σ for a Gaussian P After a suitable number steps, memory of starting point is gone this is the burn-in period; all points during burn-in should be discarded proposal function may be fixed after burn-in (more later)

MCMC step by step (for minimization) 1 Initialise P 2 Choose a random starting point z 3 Take a Metropolis-Hastings step a. Choose a proposal point y from P b. If α f ( z) f ( y) 1 accept y as the new z c. Otherwise (α < 1), generate a random uniform deviate β d. If β < α, accept y as the new z e. Otherwise, z remains the same 4 If burn-in is still going, adjust P (usually on the basis of previous points) 5 If burn-in is finished, test for convergence 6 Repeat from Step 3

Statistical features of MCMCs Bayesians love MCMCs... MCMC procedure ensures that density of points in chain is proportional to the value of f haystack Makes marginalising (integrating) over uninteresting parameters easy - just sum the number of points Must fix the proposal function to have this property = extra-important to throw out burn-in points MCMCs and similar algorithms can also be good for frequentists Don t fix the proposal function let it keep optimising itself on the go to find the global minimum Use a very strict convergence criterion

Convergence General Considerations Local minima are an issue Easy to get stuck if local mode is wider/deeper than proposal function Need to use multiple chains with different starting values, and combine results Convergence criteria Coarsest option is to test the variance σrunning 2 of the last few points in the chain - σrunning 2 < σ2 threshold = chain found a minimum (local/global) - Very rough but OK(ish) if you know f haystack is unimodal Can be defined in terms of fractional change in Bayesian evidence ( f ( x)d x) Many other more sophisticated schemes Some ppl use a constant length of chain this can be risky

A couple of other random points... Temperature Chains can be assigned different temperatures s.t. α f ( z) f ( y) exp(ln f ( z) ln f ( y) T ) = exp(ln f ( z)/t ) exp(ln f ( y)/t ) = T > 1 = for α < 1, α goes up vs. normal MCMC = steps are more easily accepted like giving the jumpy diffusive particle a higher T allows skipping over local minima more easily Combining chains with different T breaks statistical properties (I think) Alternative sampling methods Gibbs sampling, slice sampling, others... ( ) 1 f ( z) T f ( y)

A few MCMC examples in research 4 3.5 3 m 0 (TeV) 2.5 2 1.5 1 0.5 0.5 1 1.5 2 m (TeV) 1/2 Putze et al (2010)

When to use which method?... as always, this is problem-specific... make sure to try a few For 1D where you can bracket, Brent s is best For multi-d with few modes, direction set type methods do OK For multi-d with many modes, and/or badly-behaved f, need MCMC/MultiNest/GAs

Housekeeping General Considerations Next lecture: Monday Feb 4 Numerical Integration I