How information theory sheds new light on black-box optimization
|
|
- Mildred Jacobs
- 5 years ago
- Views:
Transcription
1 How information theory sheds new light on black-box optimization Anne Auger Colloque Théorie de l information : nouvelles frontières dans le cadre du centenaire de Claude Shannon IHP, 26, 27 and 28 October, 2016
2 Y Ollivier, L. Arnold, A. Auger, N. Hansen, Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles, JMLR (accepted)
3 Overview ➊ Black-Box Optimization Typical difficulties ➋ Information Geometric Optimization ➌ Invariance ➍ Recovering well-known algorithms CMA-ES PBIL, cga 3
4 Black-Box Optimization optimize f : 7! R discrete optimization = {0, 1} n continuous optimization R n 4
5 Black-Box Optimization optimize f : 7! R discrete optimization = {0, 1} n continuous optimization R n x 2 f(x) 2 R gradients not available or not useful 5
6 Which Typical Difficulties Need to be Addressed? x 2? f(x) 2 R The algorithm does not know in advance the features/ difficulties of the optimization problem that needs to be solved difficulties related to real-wold problems Should be ready to solve any of them 6
7 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-linear non-convex non-quadratic 7
8 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-separable non-linear non-convex non-quadratic dependencies between the variables 8
9 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-separable non-linear non-convex non-quadratic dependencies between the variables ill-conditioned 9
10 What Can Be in the Black-box? What Makes a Problem Difficult to Solve? non-separable non-linear non-convex non-quadratic dependencies between the variables rugged ill-conditioned 10 discontinuous noisy multi-modal
11 Algorithmic Key Components Within State-of-the-art Black-Box Optimization Algorithms randomized / stochastic robustness wrt noise, local optima invariance generalization of results translation, scale, affine-invariance, 11
12 Algorithmic Key Components Within State-of-the-art Black-Box Optimization Algorithms randomized / stochastic robustness wrt noise, local optima adaptive invariance learn on the fly certain parameters generalization of results translation, scale, affine-invariance, 12
13 Adaptive Stochastic Black-Box Algorithm t : state of the algorithm Sample candidate solutions X i t+1 = Sol( t, U i t+1),i=1,..., {U t+1,t2 N} i.i.d. Evaluate solutions X i t+1 f X i t+1 Update state of the algorithm t+1 = F t, X 1 t+1,f(x 1 t+1),..., X t+1,f(x t+1 ) 13
14 Algorithmic Key Components Within State-of-the-art Black-Box Optimization Algorithms randomized / stochastic robustness wrt noise, local optima adaptive learn on the fly certain parameters invariance translation, scale, affine-invariance, 14 x 7! f(x) x 7! f x?(x) =f(x x? )
15 Why Invariance? The grand aim of all science is to cover the greatest number of empirical facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein With Invariance: generalization of results from a single function to associated invariance class 15
16 Comparison-based Stochastic Algorithms Invariance to strictly increasing transformations Sample candidate solutions X i t+1 = Sol( t, U i t+1),i=1,..., Evaluate and rank solutions f X S(1) t+1 apple...apple f X S( ) t+1 S permutation with index of ordered solutions Update state of the algorithm t+1 = F t, U S(1) t+1,...,us( ) t+1 16
17 Comparison-based Property: Consequence Same ranking of candidate solutions on f or g f g :Im(f)! R strictly increasing f X S(1) t+1 apple...apple f X S( ) t+1 g f apple...apple g f X S(1) t+1 X S( ) t+1 Invariance to strictly increasing transformations of f 17
18 Comparison-based Property: Consequence Same ranking of candidate solutions on f or g f g :Im(f)! R strictly increasing f X S(1) t+1 apple...apple f X S( ) t+1 g f apple...apple g f X S(1) t+1 X S( ) t+1 Invariance to strictly increasing transformations of f Those three functions are optimized in the same way by a comparison-based algorithm 18
19 Comparison-based Stochastic Algorithms Sample candidate solutions X i t+1 = Sol( t, U i t+1),i=1,..., Evaluate and rank solutions f X S(1) t+1 apple...apple f X S( ) t+1 S permutation with index of ordered solutions Update state of the algorithm t+1 = F t, U S(1) t+1,...,us( ) t+1 19
20 Overview ➊ Black-Box Optimization Typical difficulties ➋ Information Geometric Optimization ➌ Invariance ➍ Recovering well-known algorithms CMA-ES PBIL, cga 20
21 Information Geometric Optimization Setting Family of probability distributions (P ) 2 on 2 continuous multicomponent parameter 21
22 Information Geometric Optimization Setting Family of probability distributions (P ) 2 on 2 continuous multicomponent parameter : statistical manifold Example: = R n P multivariate normal distribution =(m, C) 22
23 Changing Viewpoint I Transform original optimization problem on min x2 f(x) Onto optimization problem on Z : Minimize F ( ) = f(x)p (dx) Minimizing F, Find dirac-delta distribution concentrated on argmin x f(x) 23 [Wiestra et al, 2014]
24 Changing Viewpoint I Transform original optimization problem on min x2 f(x) Onto optimization problem on Z : Minimize F ( ) = f(x)p (dx) Minimizing But not F invariant, Find to strictly dirac-delta increasing distribution transformations of f concentrated on argmin x f(x) 24
25 Changing Viewpoint II Invariant under strictly increasing transformation of f Transform original optimization problem on min x2 f(x) Onto optimization problem on Z : Maximize J t( ) = W f t (x) {z } P (dx) w(p t[y : f(y) apple f(x)]) with w :[0, 1]! R decreasing weight function Rationale: f small $ W f t (x) large 25 [Ollivier et al.]
26 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z W f t (x) P (dx) 26
27 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z W f t (x) P (dx) 27
28 Natural Gradient Fisher Information Metric er Natural gradient : gradient wrt Fisher metric defined via Fisher matrix I ij ( ) = R ln P ln P j P (dx) er = KL(P + P )= 1 2 X Iij ( ) i j + O( 3 ) of P intrinsic: independent of chosen parametrization Fisher metric essentially the only way to obtain this property [Amari, Nagaoka, 2001] 28
29 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf 29
30 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf 30
31 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf 31
32 Maximizing J t ( ) Information Geometric Optimization Perform natural gradient step on t+ t = t + t e r Z Z = t + t Z = t + t W f t (x) P (dx) W f (x) r e t ln P (x) = tp t(dx) w(p t[y : f(y) apple f(x)]) r e ln P (x) = tp t(dx) does not depend on rf ➊ ➋ IGO flow: t! 0 IGO algorithms: discretization of integrals 32
33 IGO gradient flow Information Geometric Optimization set of continuous time trajectories in the defined by the ODE: - space d t dt = Z W f t (x) e r ln P (x) = tp t(dx) 33 [Ollivier et al.]
34 Information Geometric Optimization Algorithm Information Geometric Optimization (IGO) Monte Carlo Approximation of Integrals Sample X i P t, i =1,...N w(p t[y : f(y) apple f(x)]) w rk(xi )+1/2 N IGO Algorithm t+ t = t + t 1 N NX rk(xi )+1/2 w N i=1 NX = t + t ŵ ir e ln P (X i ) = t i=1 er ln P (X i ) = t 34
35 IGO Algorithm [Ollivier et al.] Monte Carlo Approximation of Integrals Sample X i P t, i =1,...N w(p t[y : f(y) apple f(x)]) w rk(xi )+1/2 N IGO Algorithm t+ t = t + t 1 N NX rk(xi )+1/2 w N i=1 NX = t + t ŵ ir e ln P (X i ) = t i=1 35 er ln P (X i ) = t consistent estimator of integral
36 Invariances IGO flow and IGO algorithm invariant to strict. increasing transformations of f IGO flow invariant under -parametrization IGO flow invariant under change of coordinates in IGO algorithm affine invariant main idea behind information geometry [Amari, Nagaoka 1993] provided the change of coordinates preserves the family of distribution for family of distributions invariant under affine transformations 36 [Ollivier et al.]
37 Instantiation of IGO Multivariate Normal Distributions [Akimoto et al. 2010] P multivariate normal distribution, =(m, C) IGO Algorithm m t+ t = m t + t C t+ t = C t + t NX ŵ i (X i m t ) i=1 NX i=1 ŵ i (X i m t )(X i m t ) T C t Recovers the CMA-ES with rank-mu update algorithm [Hansen et al. 2003] 37 affine-invariant
38 Instantiation of IGO Bernoulli measures = {0, 1} d P (x) =p 1 (x 1 )...p d (x d ) family of Bernoulli measures Recovers PBIL (Population based incremental learning) [Baluja, Caruana 1995] cga (compact Genetic Algorithm) [Harick et al. 1999] 38
39 Conclusions Information Geometric Optimization framework: a unified picture of discrete and continuous optimization theoretical foundations for existing algorithms CMA-ES state-of-the-art in continuous bb optimization some parts of CMA-ES algorithm not explained by IGO framework step-size adaptation, cumulation New algorithms: large-scale variant of CMA-ES based on IGO, 39
40 References [Ollivier et al.] Y Ollivier, L. Arnold, A. Auger, N. Hansen, Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles, JMLR (cond. accepted) [Akimoto et al. 2010] Youhei Akimoto, Yuichi Nagata, Isao Ono, and Shigenobu Kobayashi Bidirectional relation between CMA evolution strategies and natural evolution strategies, PPSN 2010 [Hansen et al. 2003] N. Hansen, S.D. Müller, and P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), ECJ 2003 [Amari, Nagaoka 1993] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry, 1993 [Baluja, Caruana 1995] Shumeet Baluja and Rich Caruana. Removing the genetics from the standard genetic algorithm. ICML, [Harick et al. 1999] Georges R Harik, Fernando G Lobo, and David E Goldberg. The compact genetic algorithm. IEEE Trans EC, [Wiestra et al, 2014] Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. Natural evolution strategies. JMLR, 2014
Natural Evolution Strategies for Direct Search
Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday
More informationarxiv: v4 [math.oc] 28 Apr 2017
Journal of Machine Learning Research 18 (2017) 1-65 Submitted 11/14; Revised 10/16; Published 4/17 Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles arxiv:1106.3708v4
More informationInformation-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen To cite this version: Yann Ollivier, Ludovic Arnold,
More informationInformation-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Ludovic Arnold, Anne Auger, Nikolaus Hansen, Yann Ollivier To cite this version: Ludovic Arnold, Anne Auger,
More informationarxiv: v3 [cs.lg] 7 Mar 2013
Objective Improvement in Information-Geometric Optimization Youhei Akimoto Project TAO INRIA Saclay LRI, Bât. 490, Univ. Paris-Sud 91405 Orsay, France Youhei.Akimoto@lri.fr Yann Ollivier CNRS & Univ. Paris-Sud
More informationNatural Gradient for the Gaussian Distribution via Least Squares Regression
Natural Gradient for the Gaussian Distribution via Least Squares Regression Luigi Malagò Department of Electrical and Electronic Engineering Shinshu University & INRIA Saclay Île-de-France 4-17-1 Wakasato,
More informationStochastic Methods for Continuous Optimization
Stochastic Methods for Continuous Optimization Anne Auger et Dimo Brockhoff Paris-Saclay Master - Master 2 Informatique - Parcours Apprentissage, Information et Contenu (AIC) 2016 Slides taken from Auger,
More informationTutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation
Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.
More informationCMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization
CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC
More informationStochastic optimization and a variable metric approach
The challenges for stochastic optimization and a variable metric approach Microsoft Research INRIA Joint Centre, INRIA Saclay April 6, 2009 Content 1 Introduction 2 The Challenges 3 Stochastic Search 4
More informationAdvanced Optimization
Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France
More informationProblem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.
Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen INRIA Saclay - Ile-de-France, project team TAO Universite Paris-Sud, LRI, Bat. 49 945 ORSAY
More informationarxiv: v1 [cs.lg] 18 Sep 2018
Parameterless Stochastic Natural Gradient Method for Discrete Optimization and its Application to Hyper-Parameter Optimization for Neural Network K. Nishida and H. Aguirre Shinshu University 17st208e@shinshu-u.ac.jp
More informationarxiv: v2 [cs.ai] 13 Jun 2011
A Linear Time Natural Evolution Strategy for Non-Separable Functions Yi Sun, Faustino Gomez, Tom Schaul, and Jürgen Schmidhuber IDSIA, University of Lugano & SUPSI, Galleria, Manno, CH-698, Switzerland
More informationEvolution Strategies and Covariance Matrix Adaptation
Evolution Strategies and Covariance Matrix Adaptation Cours Contrôle Avancé - Ecole Centrale Paris Anne Auger January 2014 INRIA Research Centre Saclay Île-de-France University Paris-Sud, LRI (UMR 8623),
More informationBenchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds
Benchmarking Natural Evolution Strategies with Adaptation Sampling on the Noiseless and Noisy Black-box Optimization Testbeds Tom Schaul Courant Institute of Mathematical Sciences, New York University
More informationA CMA-ES with Multiplicative Covariance Matrix Updates
A with Multiplicative Covariance Matrix Updates Oswin Krause Department of Computer Science University of Copenhagen Copenhagen,Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik
More informationStochastic Search using the Natural Gradient
Keywords: stochastic search, natural gradient, evolution strategies Sun Yi Daan Wierstra Tom Schaul Jürgen Schmidhuber IDSIA, Galleria 2, Manno 6928, Switzerland yi@idsiach daan@idsiach tom@idsiach juergen@idsiach
More informationIntroduction to Randomized Black-Box Numerical Optimization and CMA-ES
Introduction to Randomized Black-Box Numerical Optimization and CMA-ES July 3, 2017 CEA/EDF/Inria summer school "Numerical Analysis" Université Pierre-et-Marie-Curie, Paris, France Anne Auger, Asma Atamna,
More informationCovariance Matrix Adaptation in Multiobjective Optimization
Covariance Matrix Adaptation in Multiobjective Optimization Dimo Brockhoff INRIA Lille Nord Europe October 30, 2014 PGMO-COPI 2014, Ecole Polytechnique, France Mastertitelformat Scenario: Multiobjective
More informationIntroduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties
1 Introduction to Black-Box Optimization in Continuous Search Spaces Definitions, Examples, Difficulties Tutorial: Evolution Strategies and CMA-ES (Covariance Matrix Adaptation) Anne Auger & Nikolaus Hansen
More informationarxiv: v1 [cs.ne] 9 May 2016
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES) arxiv:1605.02720v1 [cs.ne] 9 May 2016 ABSTRACT Ilya Loshchilov University of Freiburg Freiburg, Germany ilya.loshchilov@gmail.com
More informationMirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed
Author manuscript, published in "GECCO workshop on Black-Box Optimization Benchmarking (BBOB') () 9-" DOI :./8.8 Mirrored Variants of the (,)-CMA-ES Compared on the Noiseless BBOB- Testbed [Black-Box Optimization
More informationInformation maximization in a network of linear neurons
Information maximization in a network of linear neurons Holger Arnold May 30, 005 1 Introduction It is known since the work of Hubel and Wiesel [3], that many cells in the early visual areas of mammals
More informationNumerical optimization Typical di culties in optimization. (considerably) larger than three. dependencies between the objective variables
100 90 80 70 60 50 40 30 20 10 4 3 2 1 0 1 2 3 4 0 What Makes a Function Di cult to Solve? ruggedness non-smooth, discontinuous, multimodal, and/or noisy function dimensionality non-separability (considerably)
More informationAn Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis
An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis Luigi Malagò Department of Electronics and Information Politecnico di Milano Via Ponzio, 34/5 20133 Milan,
More informationGradient-based Adaptive Stochastic Search
1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction
More informationChallenges in High-dimensional Reinforcement Learning with Evolution Strategies
Challenges in High-dimensional Reinforcement Learning with Evolution Strategies Nils Müller and Tobias Glasmachers Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {nils.mueller, tobias.glasmachers}@ini.rub.de
More informationInvestigating the Impact of Adaptation Sampling in Natural Evolution Strategies on Black-box Optimization Testbeds
Investigating the Impact o Adaptation Sampling in Natural Evolution Strategies on Black-box Optimization Testbeds Tom Schaul Courant Institute o Mathematical Sciences, New York University Broadway, New
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationBlock Diagonal Natural Evolution Strategies
Block Diagonal Natural Evolution Strategies Giuseppe Cuccu and Faustino Gomez IDSIA USI-SUPSI 6928 Manno-Lugano, Switzerland {giuse,tino}@idsia.ch http://www.idsia.ch/{~giuse,~tino} Abstract. The Natural
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationCumulative Step-size Adaptation on Linear Functions
Cumulative Step-size Adaptation on Linear Functions Alexandre Chotard, Anne Auger, Nikolaus Hansen To cite this version: Alexandre Chotard, Anne Auger, Nikolaus Hansen. Cumulative Step-size Adaptation
More informationMarket Risk. MFM Practitioner Module: Quantitiative Risk Management. John Dodson. February 8, Market Risk. John Dodson.
MFM Practitioner Module: Quantitiative Risk Management February 8, 2017 This week s material ties together our discussion going back to the beginning of the fall term about risk measures based on the (single-period)
More informationIntroduction to Optimization
Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationInvestigating the Local-Meta-Model CMA-ES for Large Population Sizes
Investigating the Local-Meta-Model CMA-ES for Large Population Sizes Zyed Bouzarkouna, Anne Auger, Didier Yu Ding To cite this version: Zyed Bouzarkouna, Anne Auger, Didier Yu Ding. Investigating the Local-Meta-Model
More informationNon-Convex Optimization in Machine Learning. Jan Mrkos AIC
Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):
More informationLegendre transformation and information geometry
Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationInformation Geometry and Algebraic Statistics
Matematikos ir informatikos institutas Vinius Information Geometry and Algebraic Statistics Giovanni Pistone and Eva Riccomagno POLITO, UNIGE Wednesday 30th July, 2008 G.Pistone, E.Riccomagno (POLITO,
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationA Restart CMA Evolution Strategy With Increasing Population Size
Anne Auger and Nikolaus Hansen A Restart CMA Evolution Strategy ith Increasing Population Size Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005 c IEEE A Restart CMA Evolution Strategy
More informationBio-inspired Continuous Optimization: The Coming of Age
Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27,
More informationRecursive Least Squares for an Entropy Regularized MSE Cost Function
Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University
More informationAddressing Numerical Black-Box Optimization: CMA-ES (Tutorial)
Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial) Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat. 490 91405
More informationNatural Policy Gradient Methods with Parameter-based Exploration for Control Tasks
Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks Atsushi Miyamae, Yuichi Nagata, Isao Ono, higenobu Kobayashi : Department of Computational Intelligence and ystems cience
More informationIntrinsic Noise in Nonlinear Gene Regulation Inference
Intrinsic Noise in Nonlinear Gene Regulation Inference Chao Du Department of Statistics, University of Virginia Joint Work with Wing H. Wong, Department of Statistics, Stanford University Transcription
More informationMulti-objective Optimization with Unbounded Solution Sets
Multi-objective Optimization with Unbounded Solution Sets Oswin Krause Dept. of Computer Science University of Copenhagen Copenhagen, Denmark oswin.krause@di.ku.dk Tobias Glasmachers Institut für Neuroinformatik
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationStochastic gradient descent on Riemannian manifolds
Stochastic gradient descent on Riemannian manifolds Silvère Bonnabel 1 Centre de Robotique - Mathématiques et systèmes Mines ParisTech SMILE Seminar Mines ParisTech Novembre 14th, 2013 1 silvere.bonnabel@mines-paristech
More informationHigh Dimensions and Heavy Tails for Natural Evolution Strategies
High Dimensions and Heavy Tails for Natural Evolution Strategies Tom Schaul IDSIA, University of Lugano Manno, Switzerland tom@idsia.ch Tobias Glasmachers IDSIA, University of Lugano Manno, Switzerland
More informationTuning Parameters across Mixed Dimensional Instances: A Performance Scalability Study of Sep-G-CMA-ES
Université Libre de Bruxelles Institut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle Tuning Parameters across Mixed Dimensional Instances: A Performance Scalability
More informationTutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation
Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.
More informationThe Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences
The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization
More informationComparison-Based Natural Gradient Optimization in High Dimension
omparison-based Natural Gradient Optimization in High Dimension Youhei Akimoto Faculty of Engineering, Shinshu University y_akimoto@shinshuuacjp Anne Auger NRA, Research centre Saclay Île-de-France anneauger@lrifr
More informationStochastic gradient descent on Riemannian manifolds
Stochastic gradient descent on Riemannian manifolds Silvère Bonnabel 1 Robotics lab - Mathématiques et systèmes Mines ParisTech Gipsa-lab, Grenoble June 20th, 2013 1 silvere.bonnabel@mines-paristech Introduction
More informationarxiv: v1 [cs.lg] 13 Dec 2013
Efficient Baseline-free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE Frank Sehnke Zentrum für Sonnenenergie- und Wasserstoff-Forschung, Industriestr. 6, Stuttgart, BW 70565 Germany
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationFitting Linear Statistical Models to Data by Least Squares I: Introduction
Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling February 5, 2014 version
More informationPortfolios that Contain Risky Assets 17: Fortune s Formulas
Portfolios that Contain Risky Assets 7: Fortune s Formulas C. David Levermore University of Maryland, College Park, MD Math 420: Mathematical Modeling April 2, 208 version c 208 Charles David Levermore
More informationNeural Learning in Structured Parameter Spaces Natural Riemannian Gradient
Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationEfficient Natural Evolution Strategies
Efficient Natural Evolution Strategies Evolution Strategies and Evolutionary Programming Track ABSTRACT Yi Sun Manno 698, Switzerland yi@idsiach Tom Schaul Manno 698, Switzerland tom@idsiach Efficient
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationInformation geometry of the power inverse Gaussian distribution
Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More information5 Equivariance. 5.1 The principle of equivariance
5 Equivariance Assuming that the family of distributions, containing that unknown distribution that data are observed from, has the property of being invariant or equivariant under some transformation,
More informationThe CMA Evolution Strategy: A Tutorial
The CMA Evolution Strategy: A Tutorial Nikolaus Hansen November 6, 205 Contents Nomenclature 3 0 Preliminaries 4 0. Eigendecomposition of a Positive Definite Matrix... 5 0.2 The Multivariate Normal Distribution...
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationEstimation-of-Distribution Algorithms. Discrete Domain.
Estimation-of-Distribution Algorithms. Discrete Domain. Petr Pošík Introduction to EDAs 2 Genetic Algorithms and Epistasis.....................................................................................
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationOptimization geometry and implicit regularization
Optimization geometry and implicit regularization Suriya Gunasekar Joint work with N. Srebro (TTIC), J. Lee (USC), D. Soudry (Technion), M.S. Nacson (Technion), B. Woodworth (TTIC), S. Bhojanapalli (TTIC),
More informationDifferentiation of functions of covariance
Differentiation of log X May 5, 2005 1 Differentiation of functions of covariance matrices or: Why you can forget you ever read this Richard Turner Covariance matrices are symmetric, but we often conveniently
More informationVector Derivatives and the Gradient
ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationMassoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39
Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction
More informationSimulation optimization via bootstrapped Kriging: Survey
Simulation optimization via bootstrapped Kriging: Survey Jack P.C. Kleijnen Department of Information Management / Center for Economic Research (CentER) Tilburg School of Economics & Management (TiSEM)
More informationVariable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem
Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem Verena Heidrich-Meisner and Christian Igel Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany {Verena.Heidrich-Meisner,Christian.Igel}@neuroinformatik.rub.de
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationThe Karcher Mean of Points on SO n
The Karcher Mean of Points on SO n Knut Hüper joint work with Jonathan Manton (Univ. Melbourne) Knut.Hueper@nicta.com.au National ICT Australia Ltd. CESAME LLN, 15/7/04 p.1/25 Contents Introduction CESAME
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationFitting Linear Statistical Models to Data by Least Squares: Introduction
Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationOptimisation numérique par algorithmes stochastiques adaptatifs
Optimisation numérique par algorithmes stochastiques adaptatifs Anne Auger M2R: Apprentissage et Optimisation avancés et Applications anne.auger@inria.fr INRIA Saclay - Ile-de-France, project team TAO
More informationof conditional information geometry,
An Extended Čencov-Campbell Characterization of Conditional Information Geometry Guy Lebanon School of Computer Science Carnegie ellon niversity lebanon@cs.cmu.edu Abstract We formulate and prove an axiomatic
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationarxiv: v6 [math.oc] 11 May 2018
Quality Gain Analysis of the Weighted Recombination Evolution Strategy on General Convex Quadratic Functions Youhei Akimoto a,, Anne Auger b, Nikolaus Hansen b a Faculty of Engineering, Information and
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationTesting System Conformance for Cyber-Physical Systems
Testing System Conformance for Cyber-Physical Systems Testing systems by walking the dog Rupak Majumdar Max Planck Institute for Software Systems Joint work with Vinayak Prabhu (MPI-SWS) and Jyo Deshmukh
More informationNumerical Algorithms as Dynamical Systems
A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive
More informationSurrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES
Covariance Matrix Adaptation-Evolution Strategy Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating and Covariance-Matrix Adaptation-ES Ilya Loshchilov, Marc Schoenauer,
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More information