CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization

Similar documents
Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Stochastic optimization and a variable metric approach

Stochastic Methods for Continuous Optimization

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES

BI-population CMA-ES Algorithms with Surrogate Models and Line Searches

Evolution Strategies and Covariance Matrix Adaptation

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial)

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

How information theory sheds new light on black-box optimization

Numerical Optimization: Basic Concepts and Algorithms

Optimisation numérique par algorithmes stochastiques adaptatifs

Optimization II: Unconstrained Multivariable

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

Numerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables

Advanced Optimization

Introduction to Optimization

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES

Experimental Comparisons of Derivative Free Optimization Algorithms

Covariance Matrix Adaptation in Multiobjective Optimization

Adaptive Coordinate Descent

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

ECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets

Empirical comparisons of several derivative free optimization algorithms

Optimization II: Unconstrained Multivariable

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Natural Evolution Strategies for Direct Search

Overview. Optimization. Easy optimization problems. Monte Carlo for Optimization. 1. Survey MC ideas for optimization: (a) Multistart

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

arxiv: v4 [math.oc] 28 Apr 2017

Deep Reinforcement Learning via Policy Optimization

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed

Regression with Numerical Optimization. Logistic

arxiv: v1 [cs.ne] 9 May 2016

Lecture 8: Policy Gradient I 2

Statistics 580 Optimization Methods

Convex Optimization CMU-10725

Comparison of NEWUOA with Different Numbers of Interpolation Points on the BBOB Noiseless Testbed

Lecture V. Numerical Optimization

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Introduction to unconstrained optimization - direct search methods

1 Numerical optimization

Natural Gradient for the Gaussian Distribution via Least Squares Regression

Optimization Concepts and Applications in Engineering

The CMA Evolution Strategy: A Tutorial

Cumulative Step-size Adaptation on Linear Functions

Bio-inspired Continuous Optimization: The Coming of Age

A (1+1)-CMA-ES for Constrained Optimisation

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Benchmarking Projection-Based Real Coded Genetic Algorithm on BBOB-2013 Noiseless Function Testbed

Multi-objective Optimization with Unbounded Solution Sets

Evolutionary Computation

A Restart CMA Evolution Strategy With Increasing Population Size

2. Quasi-Newton methods

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

Principled Design of Continuous Stochastic Search: From Theory to

COMP3411: Artificial Intelligence 7a. Evolutionary Computation

Real-Parameter Black-Box Optimization Benchmarking 2010: Presentation of the Noiseless Functions

Efficient Covariance Matrix Update for Variable Metric Evolution Strategies

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

COMP3411: Artificial Intelligence 10a. Evolutionary Computation

Evolution Strategies. Nikolaus Hansen, Dirk V. Arnold and Anne Auger. February 11, 2015

Black-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed

Optimization Methods

Optimization and Root Finding. Kurt Hornik

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

1 Numerical optimization

Nonlinear programming

Comparison-Based Optimizers Need Comparison-Based Surrogates

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds

Probabilistic numerics for deep learning

Optimization Methods for Machine Learning

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Optimization. Totally not complete this is...don't use it yet...

Evolutionary computation

Chapter 4. Unconstrained optimization

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal

MATH 4211/6211 Optimization Quasi-Newton Method

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

Markov chain Analysis of Evolution Strategies

Quasi-Newton Methods

CS599 Lecture 1 Introduction To RL

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

8 Numerical methods for unconstrained problems

Noisy Black-Box Optimization

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes

Evolutionary Design I

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES

Static unconstrained optimization

Benchmarking the Nelder-Mead Downhill Simplex Algorithm With Many Local Restarts

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Reinforcement Learning and NLP

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

5 Quasi-Newton Methods

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Convex Optimization Algorithms for Machine Learning in 10 Slides

Transcription:

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC October-December 2011 http://www.inria.fr http://www.lri.fr/~hansen INRIA: The French National Institute for Research in Computer science and Control 8 research centres, 210 research teams MSRC November 2011

Outline

...feel free to ask questions... Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction. Albert Einstein

Black-Box Optimization (Search) 4

Typical Applications

On-line registration of spline images Intraoperative ultrasound image CT image from [Winter et al 2008, Registration of CT and intraoperative 3-D ultrasound images of the spine using evolutionary and gradient-based methods ]

Distribution of final misalignment from [Winter et al 2008]

Optimization of walking gaits http://www.icos.ethz.ch/cse/research/highlights/research_highlights_august_2004 [Dürr & Pfister 2004] CMA-ES, Covariance Matrix Adaptation Evolution Strategy [Hansen et al 2003] IDEA, Iterated Density Estimation Evolutionary Algorithm [Bosman 2003] Fminsearch, downhill simplex method [Nelder & Mead 1965]

RoboCup 3D Simulated Soccer League Competition based on the Nao robot, in the RoboCup 2011: Team UT Austin Villa, University of Texas at Austin, won amongst 22 teams UT Austin Villa won all 24 games scoring 136 goals and conceding none (!) This was largely because of the agent's superior skills, optimized using CMA-ES

2011 Final http://www.youtube.com/watch?v=dmyhumt2u0w

Difficulties in black-box optimization in any case the objective function must be highly regular

Function value Rugged landscape

The Methods

Taxonomy of search methods Gradient-based methods (Taylor, smooth) local search Conjugate gradient methods [Fletcher & Reeves 1964] Quasi-Newton methods (BFGS) [Broyden et al 1970] Derivative-free optimization (DFO) Trust-region methods (NEWUOA) [Powell 2006] Simplex downhill [Nelder & Mead 1965] Pattern search [Hooke & Jeeves 1961] [Audet & Dennis 2006] Stochastic (randomized) search methods Evolutionary algorithms [Rechenberg 1965, Holland 1975] Simulated annealing (SA) [Kirkpatrick et al 1983] Simultaneous perturbation stochastic approximation (SPSA) [Spall 2000]

Taxonomy of Evolutionary Algorithms Genetic Algorithms operate on bit-strings operators (recombination mutation) are typically problem-tailored Evolution Strategies, Evolutionary Programming, Differential Evolution, Particle Swarm Optimization operate on continuous search spaces with problem-tailored encoding of the fitness Genetic Programming operates on trees evolving computer programs

Metaphores

...a stochastic optimization method...

Stochastic optimization template

Stochastic optimization template

Instantiation of the optimization template [Wiestra et al 2008, Akimito et al 2010, Glasmachers et al 2010, Malagò et al 2011, Arnold et al 2011] CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [Hansen&Ostermeier 1996, 2001, Hansen et al 2003, Hansen&Kern 2004, Jastrebski&Arnold 2006] 20

Normal (Gaussian) Distribution 21

CMA-ES CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [Hansen&Ostermeier 1996, 2001, Hansen et al 2003, Hansen&Kern 2004, Jastrebski&Arnold 2006] Natural Evolution Strategies (theory-driven) [Wierstra et al 2008, Sun et al 2009, Glasmachers et al 2010] 22

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space

Covariance Matrix Adaptation 2-D search space [Kjellstroem 1991, Hansen&Ostermeier 1996, Ljung 1999]

Interpretations/Observations

Step-size control: the concept search paths in 2-D short expected too large step-size neutral and optimal step-size long too small step-size if several updates go into the same/similar direction (if they have the same sign) the step-size is increased

CMA-ES in a nutshell 1) Sample maximum entropy distribution multivariate normal distribution 2) Ranking solutions according to their fitness invariance to order-preserving transformations 3) Update mean and covariance matrix by increasing the likelihood of good points and steps (also: natural gradient ascend) PCA variable metric, new problem representation, invariant under changes of the coordinate system 4) Update step-size based on non-local information exploit correlations in the history of steps

CMA-ES (Covariance Matrix Adaptation Evolution Strategy) = natural gradient ascent + cumulation + step-size control [Hansen&Ostermeier 2001, Hansen et al 2003, Hansen&Kern 2004] 34

Design principles applied for CMA-ES Minimal prior assumptions stochastic helps, maximum entropy distribution improvement only by selection of solutions harder to deceive Exploit all available information given remaining design principles (e.g. invariance) e.g. cumulation exploits sign information Stationarity or unbiasedness parameters remain unchanged under random ranking Almost parameter-less meaningful parameters whose choice is f-independent, e.g. learning rates (time horizons) Retain and introduce invariance properties

Invariance principle: an example Three functions belonging to the same equivalence class derivative and function-value free

...experimental validation...

CMA-ES is regarded as state-of-the-art

A simple unimodal test function

Experimentum crucis without covariance matrix adaptation it takes 1000 times longer to reach f = 10-10

Experimentum crucis without covariance matrix adaptation it takes 1000 times longer to reach f = 10-10

Runtime versus condition number 1 function evaluations separable & quadratic condition number [Auger et al 2009]

Runtime versus condition number 2 function evaluations non-separable & quadratic condition number [Auger et al 2009]

Runtime versus condition number 3 function evaluations non-separable & non-convex condition number [Auger et al 2009]

COCO/BBOB test environment COCO: COmparing Continuous Optimizers BBOB: Black-Box Optimization Benchmarking 24 test functions with a wide range of difficulties 30 noisy functions 30+ algorithms have been tested: http://coco.gforge.inria.fr

Proportion of solved function+target Results of BBOB-2009 (noisefree, 20-D)

Proportion of solved function+target Results of BBOB-2009 (noisy, f101-f130, 20-D)

Limitations of CMA-ES

Questions? CMA-ES source code: http://www.lri.fr/~hansen/cmaes_inmatlab.html