How information theory sheds new light on black-box optimization

Size: px
Start display at page:

Download "How information theory sheds new light on black-box optimization"

Transcription

1 How information theory sheds new light on black-box optimization Anne Auger Colloque Théorie de l information : nouvelles frontières dans le cadre du centenaire de Claude Shannon IHP, 26, 27 and 28 October, 2016

2 Y Ollivier, L. Arnold, A. Auger, N. Hansen, Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles, JMLR (accepted)

3 Overview ➊ Black-Box Optimization Typical difficulties ➋ Information Geometric Optimization ➌ Invariance ➍ Recovering well-known algorithms CMA-ES PBIL, cga 3

4 Black-Box Optimization optimize f : 7! R discrete optimization = {0, 1} n continuous optimization R n 4

5 Black-Box Optimization optimize f : 7! R discrete optimization = {0, 1} n continuous optimization R n x 2 f(x) 2 R gradients not available or not useful 5

6 Which Typical Difficulties Need to be Addressed? x 2? f(x) 2 R The algorithm does not know in advance the features/ difficulties of the optimization problem that needs to be solved difficulties related to real-wold problems Should be ready to solve any of them 6

7 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-linear non-convex non-quadratic 7

8 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-separable non-linear non-convex non-quadratic dependencies between the variables 8

9 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-separable non-linear non-convex non-quadratic dependencies between the variables ill-conditioned 9

10 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-separable non-linear non-convex non-quadratic dependencies between the variables rugged ill-conditioned 10 discontinuous noisy multi-modal

11 Algorithmic Key Components Within State-of-the-art Black-Box Optimization Algorithms randomized / stochastic robustness wrt noise, local optima invariance generalization of results translation, scale, affine-invariance, 11

12 Algorithmic Key Components Within State-of-the-art Black-Box Optimization Algorithms randomized / stochastic robustness wrt noise, local optima adaptive invariance learn on the fly certain parameters generalization of results translation, scale, affine-invariance, 12

13 Adaptive Stochastic Black-Box Algorithm t : state of the algorithm Sample candidate solutions X i t+1 = Sol( t, U i t+1),i=1,..., {U t+1,t2 N} i.i.d. Evaluate solutions X i t+1 f X i t+1 Update state of the algorithm t+1 = F t, X 1 t+1,f(x 1 t+1),..., X t+1,f(x t+1 ) 13

14 Algorithmic Key Components Within State-of-the-art Black-Box Optimization Algorithms randomized / stochastic robustness wrt noise, local optima adaptive learn on the fly certain parameters invariance translation, scale, affine-invariance, 14 x 7! f(x) x 7! f x?(x) =f(x x? )

15 Why Invariance? The grand aim of all science is to cover the greatest number of empirical facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein With Invariance: generalization of results from a single function to associated invariance class 15

16 Comparison-based Stochastic Algorithms Invariance to strictly increasing transformations Sample candidate solutions X i t+1 = Sol( t, U i t+1),i=1,..., Evaluate and rank solutions f X S(1) t+1 apple...apple f X S( ) t+1 S permutation with index of ordered solutions Update state of the algorithm t+1 = F t, U S(1) t+1,...,us( ) t+1 16

17 Comparison-based Property: Consequence Same ranking of candidate solutions on f or g f g :Im(f)! R strictly increasing f X S(1) t+1 apple...apple f X S( ) t+1 g f apple...apple g f X S(1) t+1 X S( ) t+1 Invariance to strictly increasing transformations of f 17

18 Comparison-based Property: Consequence Same ranking of candidate solutions on f or g f g :Im(f)! R strictly increasing f X S(1) t+1 apple...apple f X S( ) t+1 g f apple...apple g f X S(1) t+1 X S( ) t+1 Invariance to strictly increasing transformations of f Those three functions are optimized in the same way by a comparison-based algorithm 18

19 Comparison-based Stochastic Algorithms Sample candidate solutions X i t+1 = Sol( t, U i t+1),i=1,..., Evaluate and rank solutions f X S(1) t+1 apple...apple f X S( ) t+1 S permutation with index of ordered solutions Update state of the algorithm t+1 = F t, U S(1) t+1,...,us( ) t+1 19

20 Overview ➊ Black-Box Optimization Typical difficulties ➋ Information Geometric Optimization ➌ Invariance ➍ Recovering well-known algorithms CMA-ES PBIL, cga 20

21 Information Geometric Optimization Setting Family of probability distributions (P ) 2 on 2 continuous multicomponent parameter 21

22 Information Geometric Optimization Setting Family of probability distributions (P ) 2 on 2 continuous multicomponent parameter : statistical manifold Example: = R n P multivariate normal distribution =(m, C) 22

23 Changing Viewpoint I Transform original optimization problem on min x2 f(x) Onto optimization problem on Z : Minimize F ( ) = f(x)p (dx) Minimizing F, Find dirac-delta distribution concentrated on argmin x f(x) 23 [Wiestra et al, 2014]

24 Changing Viewpoint I Transform original optimization problem on min x2 f(x) Onto optimization problem on Z : Minimize F ( ) = f(x)p (dx) Minimizing But not F invariant, Find to strictly dirac-delta increasing distribution transformations of f concentrated on argmin x f(x) 24

25 Changing Viewpoint II Invariant under strictly increasing transformation of f Transform original optimization problem on min x2 f(x) Onto optimization problem on Z : Maximize J t( ) = W f t (x) {z } P (dx) w(p t[y : f(y) apple f(x)]) with w :[0, 1]! R decreasing weight function Rationale: f small $ W f t (x) large 25 [Ollivier et al.]

26 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z W f t (x) P (dx) 26

27 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z W f t (x) P (dx) 27

28 Natural Gradient Fisher Information Metric er Natural gradient : gradient wrt Fisher metric defined via Fisher matrix I ij ( ) = R ln P ln P j P (dx) er = KL(P + P )= 1 2 X Iij ( ) i j + O( 3 ) of P intrinsic: independent of chosen parametrization Fisher metric essentially the only way to obtain this property [Amari, Nagaoka, 2001] 28

29 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf 29

30 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf 30

31 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf 31

32 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf ➊ ➋ IGO flow: t! 0 IGO algorithms: discretization of integrals 32

33 IGO gradient flow Information Geometric Optimization set of continuous time trajectories in the defined by the ODE: - space d t dt = Z W f t (x) e r ln P (x) = tp t(dx) 33 [Ollivier et al.]

34 Information Geometric Optimization Algorithm Information Geometric Optimization (IGO) Monte Carlo Approximation of Integrals Sample X i P t, i =1,...N w(p t[y : f(y) apple f(x)]) w rk(xi )+1/2 N IGO Algorithm t+ t = t + t 1 N NX rk(xi )+1/2 w N i=1 NX = t + t ŵ ir e ln P (X i ) = t i=1 er ln P (X i ) = t 34

35 IGO Algorithm [Ollivier et al.] Monte Carlo Approximation of Integrals Sample X i P t, i =1,...N w(p t[y : f(y) apple f(x)]) w rk(xi )+1/2 N IGO Algorithm t+ t = t + t 1 N NX rk(xi )+1/2 w N i=1 NX = t + t ŵ ir e ln P (X i ) = t i=1 35 er ln P (X i ) = t consistent estimator of integral

36 Invariances IGO flow and IGO algorithm invariant to strict. increasing transformations of f IGO flow invariant under -parametrization IGO flow invariant under change of coordinates in IGO algorithm affine invariant main idea behind information geometry [Amari, Nagaoka 1993] provided the change of coordinates preserves the family of distribution for family of distributions invariant under affine transformations 36 [Ollivier et al.]

37 Instantiation of IGO Multivariate Normal Distributions [Akimoto et al. 2010] P multivariate normal distribution, =(m, C) IGO Algorithm m t+ t = m t + t C t+ t = C t + t NX ŵ i (X i m t ) i=1 NX i=1 ŵ i (X i m t )(X i m t ) T C t Recovers the CMA-ES with rank-mu update algorithm [Hansen et al. 2003] 37 affine-invariant

38 Instantiation of IGO Bernoulli measures = {0, 1} d P (x) =p 1 (x 1 )...p d (x d ) family of Bernoulli measures Recovers PBIL (Population based incremental learning) [Baluja, Caruana 1995] cga (compact Genetic Algorithm) [Harick et al. 1999] 38

39 Conclusions Information Geometric Optimization framework: a unified picture of discrete and continuous optimization theoretical foundations for existing algorithms CMA-ES state-of-the-art in continuous bb optimization some parts of CMA-ES algorithm not explained by IGO framework step-size adaptation, cumulation New algorithms: large-scale variant of CMA-ES based on IGO, 39

40 References [Ollivier et al.] Y Ollivier, L. Arnold, A. Auger, N. Hansen, Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles, JMLR (cond. accepted) [Akimoto et al. 2010] Youhei Akimoto, Yuichi Nagata, Isao Ono, and Shigenobu Kobayashi Bidirectional relation between CMA evolution strategies and natural evolution strategies, PPSN 2010 [Hansen et al. 2003] N. Hansen, S.D. Müller, and P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), ECJ 2003 [Amari, Nagaoka 1993] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry, 1993 [Baluja, Caruana 1995] Shumeet Baluja and Rich Caruana. Removing the genetics from the standard genetic algorithm. ICML, [Harick et al. 1999] Georges R Harik, Fernando G Lobo, and David E Goldberg. The compact genetic algorithm. IEEE Trans EC, [Wiestra et al, 2014] Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. Natural evolution strategies. JMLR, 2014

Natural Evolution Strategies for Direct Search

Natural Evolution Strategies for Direct Search Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday

More information

arxiv: v4 [math.oc] 28 Apr 2017

arxiv: v4 [math.oc] 28 Apr 2017 Journal of Machine Learning Research 18 (2017) 1-65 Submitted 11/14; Revised 10/16; Published 4/17 Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles arxiv:1106.3708v4

More information

Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen To cite this version: Yann Ollivier, Ludovic Arnold,

More information

Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Ludovic Arnold, Anne Auger, Nikolaus Hansen, Yann Ollivier To cite this version: Ludovic Arnold, Anne Auger,

More information

arxiv: v3 [cs.lg] 7 Mar 2013

arxiv: v3 [cs.lg] 7 Mar 2013 Objective Improvement in Information-Geometric Optimization Youhei Akimoto Project TAO INRIA Saclay LRI, Bât. 490, Univ. Paris-Sud 91405 Orsay, France Youhei.Akimoto@lri.fr Yann Ollivier CNRS & Univ. Paris-Sud

More information

Natural Gradient for the Gaussian Distribution via Least Squares Regression

Natural Gradient for the Gaussian Distribution via Least Squares Regression Natural Gradient for the Gaussian Distribution via Least Squares Regression Luigi Malagò Department of Electrical and Electronic Engineering Shinshu University & INRIA Saclay Île-de-France 4-17-1 Wakasato,

More information

Stochastic Methods for Continuous Optimization

Stochastic Methods for Continuous Optimization Stochastic Methods for Continuous Optimization Anne Auger et Dimo Brockhoff Paris-Saclay Master - Master 2 Informatique - Parcours Apprentissage, Information et Contenu (AIC) 2016 Slides taken from Auger,

More information

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.

More information

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC

More information

Stochastic optimization and a variable metric approach

Stochastic optimization and a variable metric approach The challenges for stochastic optimization and a variable metric approach Microsoft Research INRIA Joint Centre, INRIA Saclay April 6, 2009 Content 1 Introduction 2 The Challenges 3 Stochastic Search 4

More information

Advanced Optimization

Advanced Optimization Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France

More information

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen INRIA Saclay - Ile-de-France, project team TAO Universite Paris-Sud, LRI, Bat. 49 945 ORSAY

More information

arxiv: v1 [cs.lg] 18 Sep 2018

arxiv: v1 [cs.lg] 18 Sep 2018 Parameterless Stochastic Natural Gradient Method for Discrete Optimization and its Application to Hyper-Parameter Optimization for Neural Network K. Nishida and H. Aguirre Shinshu University 17st208e@shinshu-u.ac.jp

More information

arxiv: v2 [cs.ai] 13 Jun 2011

arxiv: v2 [cs.ai] 13 Jun 2011 A Linear Time Natural Evolution Strategy for Non-Separable Functions Yi Sun, Faustino Gomez, Tom Schaul, and Jürgen Schmidhuber IDSIA, University of Lugano & SUPSI, Galleria, Manno, CH-698, Switzerland

More information

Evolution Strategies and Covariance Matrix Adaptation

Evolution Strategies and Covariance Matrix Adaptation Evolution Strategies and Covariance Matrix Adaptation Cours Contrôle Avancé - Ecole Centrale Paris Anne Auger January 2014 INRIA Research Centre Saclay Île-de-France University Paris-Sud, LRI (UMR 8623),

More information

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds

Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University

More information

A CMA-ES with Multiplicative Covariance Matrix Updates

A CMA-ES with Multiplicative Covariance Matrix Updates A with Multiplicative Covariance Matrix Updates Oswin Krause Department of Computer Science University of Copenhagen Copenhagen,Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik

More information

Stochastic Search using the Natural Gradient

Stochastic Search using the Natural Gradient Keywords: stochastic search, natural gradient, evolution strategies Sun Yi Daan Wierstra Tom Schaul Jürgen Schmidhuber IDSIA, Galleria 2, Manno 6928, Switzerland yi@idsiach daan@idsiach tom@idsiach juergen@idsiach

More information

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES Introduction to Randomized Black-Box Numerical Optimization and CMA-ES July 3, 2017 CEA/EDF/Inria summer school "Numerical Analysis" Université Pierre-et-Marie-Curie, Paris, France Anne Auger, Asma Atamna,

More information

Covariance Matrix Adaptation in Multiobjective Optimization

Covariance Matrix Adaptation in Multiobjective Optimization Covariance Matrix Adaptation in Multiobjective Optimization Dimo Brockhoff INRIA Lille Nord Europe October 30, 2014 PGMO-COPI 2014, Ecole Polytechnique, France Mastertitelformat Scenario: Multiobjective

More information

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties 1 Introduction to Black-Box Optimization in Continuous Search Spaces Definitions, Examples, Difficulties Tutorial: Evolution Strategies and CMA-ES (Covariance Matrix Adaptation) Anne Auger & Nikolaus Hansen

More information

arxiv: v1 [cs.ne] 9 May 2016

arxiv: v1 [cs.ne] 9 May 2016 Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES) arxiv:1605.02720v1 [cs.ne] 9 May 2016 ABSTRACT Ilya Loshchilov University of Freiburg Freiburg, Germany ilya.loshchilov@gmail.com

More information

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed Author manuscript, published in "GECCO workshop on Black-Box Optimization Benchmarking (BBOB') () 9-" DOI :./8.8 Mirrored Variants of the (,)-CMA-ES Compared on the Noiseless BBOB- Testbed [Black-Box Optimization

More information

Information maximization in a network of linear neurons

Information maximization in a network of linear neurons Information maximization in a network of linear neurons Holger Arnold May 30, 005 1 Introduction It is known since the work of Hubel and Wiesel [3], that many cells in the early visual areas of mammals

More information

Numerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables

Numerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables 100 90 80 70 60 50 40 30 20 10 4 3 2 1 0 1 2 3 4 0 What Makes a Function Di cult to Solve? ruggedness non-smooth, discontinuous, multimodal, and/or noisy function dimensionality non-separability (considerably)

More information

An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis

An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis Luigi Malagò Department of Electronics and Information Politecnico di Milano Via Ponzio, 34/5 20133 Milan,

More information

Gradient-based Adaptive Stochastic Search

Gradient-based Adaptive Stochastic Search 1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction

More information

Challenges in High-dimensional Reinforcement Learning with Evolution Strategies

Challenges in High-dimensional Reinforcement Learning with Evolution Strategies Challenges in High-dimensional Reinforcement Learning with Evolution Strategies Nils Müller and Tobias Glasmachers Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {nils.mueller, tobias.glasmachers}@ini.rub.de

More information

Investigating the Impact of Adaptation Sampling in Natural Evolution Strategies on Black-box Optimization Testbeds

Investigating the Impact of Adaptation Sampling in Natural Evolution Strategies on Black-box Optimization Testbeds Investigating the Impact o Adaptation Sampling in Natural Evolution Strategies on Black-box Optimization Testbeds Tom Schaul Courant Institute o Mathematical Sciences, New York University Broadway, New

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Block Diagonal Natural Evolution Strategies

Block Diagonal Natural Evolution Strategies Block Diagonal Natural Evolution Strategies Giuseppe Cuccu and Faustino Gomez IDSIA USI-SUPSI 6928 Manno-Lugano, Switzerland {giuse,tino}@idsia.ch http://www.idsia.ch/{~giuse,~tino} Abstract. The Natural

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Cumulative Step-size Adaptation on Linear Functions

Cumulative Step-size Adaptation on Linear Functions Cumulative Step-size Adaptation on Linear Functions Alexandre Chotard, Anne Auger, Nikolaus Hansen To cite this version: Alexandre Chotard, Anne Auger, Nikolaus Hansen. Cumulative Step-size Adaptation

More information

Market Risk. MFM Practitioner Module: Quantitiative Risk Management. John Dodson. February 8, Market Risk. John Dodson.

Market Risk. MFM Practitioner Module: Quantitiative Risk Management. John Dodson. February 8, Market Risk. John Dodson. MFM Practitioner Module: Quantitiative Risk Management February 8, 2017 This week s material ties together our discussion going back to the beginning of the fall term about risk measures based on the (single-period)

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes Investigating the Local-Meta-Model CMA-ES for Large Population Sizes Zyed Bouzarkouna, Anne Auger, Didier Yu Ding To cite this version: Zyed Bouzarkouna, Anne Auger, Didier Yu Ding. Investigating the Local-Meta-Model

More information

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):

More information

Legendre transformation and information geometry

Legendre transformation and information geometry Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Information Geometry and Algebraic Statistics

Information Geometry and Algebraic Statistics Matematikos ir informatikos institutas Vinius Information Geometry and Algebraic Statistics Giovanni Pistone and Eva Riccomagno POLITO, UNIGE Wednesday 30th July, 2008 G.Pistone, E.Riccomagno (POLITO,

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

A Restart CMA Evolution Strategy With Increasing Population Size

A Restart CMA Evolution Strategy With Increasing Population Size Anne Auger and Nikolaus Hansen A Restart CMA Evolution Strategy ith Increasing Population Size Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005 c IEEE A Restart CMA Evolution Strategy

More information

Bio-inspired Continuous Optimization: The Coming of Age

Bio-inspired Continuous Optimization: The Coming of Age Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27,

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial)

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial) Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial) Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat. 490 91405

More information

Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks Atsushi Miyamae, Yuichi Nagata, Isao Ono, higenobu Kobayashi : Department of Computational Intelligence and ystems cience

More information

Intrinsic Noise in Nonlinear Gene Regulation Inference

Intrinsic Noise in Nonlinear Gene Regulation Inference Intrinsic Noise in Nonlinear Gene Regulation Inference Chao Du Department of Statistics, University of Virginia Joint Work with Wing H. Wong, Department of Statistics, Stanford University Transcription

More information

Multi-objective Optimization with Unbounded Solution Sets

Multi-objective Optimization with Unbounded Solution Sets Multi-objective Optimization with Unbounded Solution Sets Oswin Krause Dept. of Computer Science University of Copenhagen Copenhagen, Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Stochastic gradient descent on Riemannian manifolds

Stochastic gradient descent on Riemannian manifolds Stochastic gradient descent on Riemannian manifolds Silvère Bonnabel 1 Centre de Robotique - Mathématiques et systèmes Mines ParisTech SMILE Seminar Mines ParisTech Novembre 14th, 2013 1 silvere.bonnabel@mines-paristech

More information

High Dimensions and Heavy Tails for Natural Evolution Strategies

High Dimensions and Heavy Tails for Natural Evolution Strategies High Dimensions and Heavy Tails for Natural Evolution Strategies Tom Schaul IDSIA, University of Lugano Manno, Switzerland tom@idsia.ch Tobias Glasmachers IDSIA, University of Lugano Manno, Switzerland

More information

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES

Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES Université Libre de Bruxelles Institut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability

More information

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.

More information

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization

More information

Comparison-Based Natural Gradient Optimization in High Dimension

Comparison-Based Natural Gradient Optimization in High Dimension omparison-based Natural Gradient Optimization in High Dimension Youhei Akimoto Faculty of Engineering, Shinshu University y_akimoto@shinshuuacjp Anne Auger NRA, Research centre Saclay Île-de-France anneauger@lrifr

More information

Stochastic gradient descent on Riemannian manifolds

Stochastic gradient descent on Riemannian manifolds Stochastic gradient descent on Riemannian manifolds Silvère Bonnabel 1 Robotics lab - Mathématiques et systèmes Mines ParisTech Gipsa-lab, Grenoble June 20th, 2013 1 silvere.bonnabel@mines-paristech Introduction

More information

arxiv: v1 [cs.lg] 13 Dec 2013

arxiv: v1 [cs.lg] 13 Dec 2013 Efficient Baseline-free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE Frank Sehnke Zentrum für Sonnenenergie- und Wasserstoff-Forschung, Industriestr. 6, Stuttgart, BW 70565 Germany

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Fitting Linear Statistical Models to Data by Least Squares I: Introduction

Fitting Linear Statistical Models to Data by Least Squares I: Introduction Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling February 5, 2014 version

More information

Portfolios that Contain Risky Assets 17: Fortune s Formulas

Portfolios that Contain Risky Assets 17: Fortune s Formulas Portfolios that Contain Risky Assets 7: Fortune s Formulas C. David Levermore University of Maryland, College Park, MD Math 420: Mathematical Modeling April 2, 208 version c 208 Charles David Levermore

More information

Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient

Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Efficient Natural Evolution Strategies

Efficient Natural Evolution Strategies Efficient Natural Evolution Strategies Evolution Strategies and Evolutionary Programming Track ABSTRACT Yi Sun Manno 698, Switzerland yi@idsiach Tom Schaul Manno 698, Switzerland tom@idsiach Efficient

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Information geometry of the power inverse Gaussian distribution

Information geometry of the power inverse Gaussian distribution Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

5 Equivariance. 5.1 The principle of equivariance

5 Equivariance. 5.1 The principle of equivariance 5 Equivariance Assuming that the family of distributions, containing that unknown distribution that data are observed from, has the property of being invariant or equivariant under some transformation,

More information

The CMA Evolution Strategy: A Tutorial

The CMA Evolution Strategy: A Tutorial The CMA Evolution Strategy: A Tutorial Nikolaus Hansen November 6, 205 Contents Nomenclature 3 0 Preliminaries 4 0. Eigendecomposition of a Positive Definite Matrix... 5 0.2 The Multivariate Normal Distribution...

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Estimation-of-Distribution Algorithms. Discrete Domain.

Estimation-of-Distribution Algorithms. Discrete Domain. Estimation-of-Distribution Algorithms. Discrete Domain. Petr Pošík Introduction to EDAs 2 Genetic Algorithms and Epistasis.....................................................................................

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Optimization geometry and implicit regularization

Optimization geometry and implicit regularization Optimization geometry and implicit regularization Suriya Gunasekar Joint work with N. Srebro (TTIC), J. Lee (USC), D. Soudry (Technion), M.S. Nacson (Technion), B. Woodworth (TTIC), S. Bhojanapalli (TTIC),

More information

Differentiation of functions of covariance

Differentiation of functions of covariance Differentiation of log X May 5, 2005 1 Differentiation of functions of covariance matrices or: Why you can forget you ever read this Richard Turner Covariance matrices are symmetric, but we often conveniently

More information

Vector Derivatives and the Gradient

Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction

More information

Simulation optimization via bootstrapped Kriging: Survey

Simulation optimization via bootstrapped Kriging: Survey Simulation optimization via bootstrapped Kriging: Survey Jack P.C. Kleijnen Department of Information Management / Center for Economic Research (CentER) Tilburg School of Economics & Management (TiSEM)

More information

Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem

Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem Verena Heidrich-Meisner and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {Verena.Heidrich-Meisner,Christian.Igel}@neuroinformatik.rub.de

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

The Karcher Mean of Points on SO n

The Karcher Mean of Points on SO n The Karcher Mean of Points on SO n Knut Hüper joint work with Jonathan Manton (Univ. Melbourne) Knut.Hueper@nicta.com.au National ICT Australia Ltd. CESAME LLN, 15/7/04 p.1/25 Contents Introduction CESAME

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Fitting Linear Statistical Models to Data by Least Squares: Introduction Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

Optimisation numérique par algorithmes stochastiques adaptatifs

Optimisation numérique par algorithmes stochastiques adaptatifs Optimisation numérique par algorithmes stochastiques adaptatifs Anne Auger M2R: Apprentissage et Optimisation avancés et Applications anne.auger@inria.fr INRIA Saclay - Ile-de-France, project team TAO

More information

of conditional information geometry,

of conditional information geometry, An Extended Čencov-Campbell Characterization of Conditional Information Geometry Guy Lebanon School of Computer Science Carnegie ellon niversity lebanon@cs.cmu.edu Abstract We formulate and prove an axiomatic

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

arxiv: v6 [math.oc] 11 May 2018

arxiv: v6 [math.oc] 11 May 2018 Quality Gain Analysis of the Weighted Recombination Evolution Strategy on General Convex Quadratic Functions Youhei Akimoto a,, Anne Auger b, Nikolaus Hansen b a Faculty of Engineering, Information and

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Testing System Conformance for Cyber-Physical Systems

Testing System Conformance for Cyber-Physical Systems Testing System Conformance for Cyber-Physical Systems Testing systems by walking the dog Rupak Majumdar Max Planck Institute for Software Systems Joint work with Vinayak Prabhu (MPI-SWS) and Jyo Deshmukh

More information

Numerical Algorithms as Dynamical Systems

Numerical Algorithms as Dynamical Systems A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive

More information

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES Covariance Matrix Adaptation-Evolution Strategy Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating and Covariance-Matrix Adaptation-ES Ilya Loshchilov, Marc Schoenauer,

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information