Data-Driven Global Optimization

Similar documents
A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions

Accuracy, Precision and Efficiency in Sparse Grids

Pascal s Triangle on a Budget. Accuracy, Precision and Efficiency in Sparse Grids

Slow Growth for Gauss Legendre Sparse Grids

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Multidimensional interpolation. 1 An abstract description of interpolation in one dimension

Quadratic Equations and Functions

Radial Basis Function (RBF) Networks

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

Slow Exponential Growth for Clenshaw Curtis Sparse Grids

Interpolation-Based Trust-Region Methods for DFO

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

Computational Fluid Dynamics Prof. Dr. Suman Chakraborty Department of Mechanical Engineering Indian Institute of Technology, Kharagpur

Interpolation. 1. Judd, K. Numerical Methods in Economics, Cambridge: MIT Press. Chapter

Lesson 24: Using the Quadratic Formula,

Math 171 Spring 2017 Final Exam. Problem Worth

Control of Mobile Robots

The domain and range of lines is always R Graphed Examples:

2.4 Error Analysis for Iterative Methods

Orthogonal Polynomials, Quadratures & Sparse-Grid Methods for Probability Integrals

Multilevel stochastic collocations with dimensionality reduction

3.2 A2 - Just Like Derivatives but Backwards

Some examples of radical equations are. Unfortunately, the reverse implication does not hold for even numbers nn. We cannot

Lecture 3. Linear Regression

Data representation and approximation

Using the Application Builder for Neutron Transport in Discrete Ordinates

Intermediate Algebra

On ANOVA expansions and strategies for choosing the anchor point

Lecture 6. Notes on Linear Algebra. Perceptron

Chapter 1. Gaining Knowledge with Design of Experiments

Estimation of cumulative distribution function with spline functions

Prediction Intervals for Functional Data

( ) y 2! 4. ( )( y! 2)

Projection Methods. (Lectures on Solution Methods for Economists IV) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 March 7, 2018

Haar Basis Wavelets and Morlet Wavelets

Exam Programme VWO Mathematics A

Slow Exponential Growth for Clenshaw Curtis Sparse Grids jburkardt/presentations/... slow growth paper.

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE

Function Approximation

P.3 Division of Polynomials

Simultaneous state and input estimation of non-linear process with unknown inputs using particle swarm optimization particle filter (PSO-PF) algorithm

Lesson 18: Recognizing Equations of Circles

Slow Growth for Sparse Grids

Secondary Two Mathematics: An Integrated Approach Module 3 - Part One Imaginary Number, Exponents, and Radicals

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.

A recursive model-based trust-region method for derivative-free bound-constrained optimization.

STRONGLY NESTED 1D INTERPOLATORY QUADRATURE AND EXTENSION TO ND VIA SMOLYAK METHOD

Sparse Grids. Léopold Cambier. February 17, ICME, Stanford University

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal

Integrating Rational functions by the Method of Partial fraction Decomposition. Antony L. Foster

PHL424: Nuclear Shell Model. Indian Institute of Technology Ropar

Kinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

(0, 0), (1, ), (2, ), (3, ), (4, ), (5, ), (6, ).

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky

5.6 Multistep Methods

Number of solutions of a system

Lecture 2: Plasma particles with E and B fields

Lecture Note 3: Interpolation and Polynomial Approximation. Xiaoqun Zhang Shanghai Jiao Tong University

Recurrence Relations

Implicitely and Densely Discrete Black-Box Optimization Problems

Lecture Note 3: Polynomial Interpolation. Xiaoqun Zhang Shanghai Jiao Tong University

Generation X and Y. 1. Experiment with the following function using your CAS (for instance try n = 0,1,2,3): 1 2 t + t2-1) n + 1 (

STOCHASTIC SAMPLING METHODS

Compute the behavior of reality even if it is impossible to observe the processes (for example a black hole in astrophysics).

Algebraic Codes and Invariance

Lesson 1: Successive Differences in Polynomials

CSC 578 Neural Networks and Deep Learning

Maximum and Minimum Values section 4.1

Time Domain Analysis of Linear Systems Ch2. University of Central Oklahoma Dr. Mohamed Bingabr

Fully Understanding the Hashing Trick

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018


Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Dan Roth 461C, 3401 Walnut

CS249: ADVANCED DATA MINING

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

THE POSTDOC VARIANT OF THE SECRETARY PROBLEM 1. INTRODUCTION.

Compressed representation of Kohn-Sham orbitals via selected columns of the density matrix

Creeping Movement across a Long Strike-Slip Fault in a Half Space of Linear Viscoelastic Material Representing the Lithosphere-Asthenosphere System

Lecture 15: Exploding and Vanishing Gradients

Scientific Computing: An Introductory Survey

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Introduction to Scientific Computing

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

The Elasticity of Quantum Spacetime Fabric. Viorel Laurentiu Cartas

Chapter 3: Root Finding. September 26, 2005

Lecture 3 Optimization methods for econometrics models

Algorithms for Uncertainty Quantification

Journal of Quality Measurement and Analysis JQMA 4(1) 2008, 1-9 Jurnal Pengukuran Kualiti dan Analisis NUMERICAL ANALYSIS JOHN BUTCHER

Mathematics Ext 2. HSC 2014 Solutions. Suite 403, 410 Elizabeth St, Surry Hills NSW 2010 keystoneeducation.com.

Q1. Discuss, compare and contrast various curve fitting and interpolation methods

Lecture 10: Logistic Regression

Advanced data analysis

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Transcription:

Data-Driven Global Optimization Fani Boukouvala & Chris Kieslich Christodoulos A Floudas Memorial Symposium Princeton University May 6 th 2017

A story of global optimization & more September 17 th 2014 Princeton, NJ

Global Optimization using data Data-Driven optimization = Optimization without derivatives or without equations ff xx = 100 + 0.5xx 2 11xx dddd xx dddd Your attention during this talk! = xx 11 ddff ddxx =0 xx 11 = 0 xx = 11 Minimum attention at 11 minutes

Why would we not have ff xx? There are two reasons: 1. We are trying to optimize a system/process that is not understood well or cannot be described by an equation Perfom experiments: 2. We are trying to optimize a very complex system/ phenomenon/process that requires a simulation Designing an airplane Testing car during crash Designing a protein Designing a chemical plant

How do we optimize without equations? Inputs Black-Box or Grey-Box (simulation or experiment) Outputs Caballero & Grossmann, AICHE J., 54(10), 2008. Henao & Maravelias, AIChE J. 57(5), 2011. Boukouvala, Hasan & Floudas, JOGO, 2015. Boukouvala & Floudas, OPTL, 2016. Conn, Scheinberg, Vicente. SIAM, 2009. Rios & Sahinidis, JOGO, 56(3), 2013. Davis & Ierapetritou, IECR, 47(16), 2008. Jones et al., JOGO, 13(4), 1998

How would you find the deepest spot on the lake? An analogy for black box problems How would you find the volume of water in the lake?

Analogy for black-box problem: Collecting data x y Hand Lead-line Depth

Analogy for black-box problem: Making a map x y Hand Lead-line Depth

Analogy for black-box problem: Challenges for making the best map Where should we collect data? How many data-points do we need? Complexity of collecting each data-point (time, cost) If we have a function to represent the map, then for any x,y we can predict the depth ddddddddd = ff(xx, yy) Surrogate function

What makes a good surrogate function? Surrogate functions are models that are fitted/tuned to best predict the collected data Good characteristics: Accurate representation of black-box system: ff xx ff ssss (xx) 00 Simple function with less parameters Requires tractable number of data-points to be accuracte Many different types have been used: Surrogate Type Functional Form Quadratic ff ssss xx = bb 0 + bb 1 xx 1 + bb 2 xx 2 + bb 11 xx 1 2 + bb 12 xx 1 xx 2 + bb 22 xx 2 2 Kriging N ff ssss xx = μμ + cc ii eeeeee ii=1 2 θθ jj jj=1 xx jj ii xx jj 2 Radial Basis Function ff ssss xx = μμ + bb 1 xx 1 + bb 2 xx 2 + cc ii N ii=1 2 jj=1 xx jj ii xx jj 2 llll xxjj ii xx jj Boukouvala, F., R. Misener, and C.A. Floudas, European Journal of Operational Research, 2016. 252(3): p. 701-727

Adding more equidistant points is not necessarily better Intuition: More points better approximations Reality: Location of points coupled with type of surrogate function fitted are more important than quantity of points! Runge proved this in 1901 through this simple example( Runge Phenomenon ): ff xx = 1 1 + 25xx 2 1 x 1 10 15 20 46 points

What happens in high dimensional spaces? One of the biggest challenges in this work: Curse-of-Dimensionality nn (Dimension) Full tensor product grid with 4 levels (44 nn ) 2 16 5 1,024 10 1,048,576 20 1,099,511,627,776

Smolyak s theory for sparse grids In 1963 Smolyak proposed a method for Sparse multidimensional grids, as an optimized tensor product of unidimensional gridpoints Gridpoints are roots or extrema of various orthogonal polynomials Most commonly used are Chebyshev function extrema Chebyshev-based grid points are nested& non-equidistant SG used predominantly for calculating integrals 1 (Remember: What is the volume of water in lake?) Adding points to SG, improves surrogate approximation accuracy: ff xx ff ssss (xx) 00 1. Smolyak, S.A. Quadrature and interpolation formulas for tensor products of certain classes of functions. in Dokl. Akad. Nauk SSSR. 1963. 2. Barthelmann, V., E. Novak, and K. Ritter, Advances in Computational Mathematics, 2000. 12(4): p. 273-288. 3. Davis, P.J. and P. Rabinowitz, 2007: Courier Corporation. 4. Judd, K.L., et al.,. Journal of Economic Dynamics and Control, 2014. 44: p. 92-123.

Using Smolyak points to fit polynomial surrogate functions Step 1: Select level of approximation (μμ) (linked to polynomial exactness) Step 2 (Smolyak rule): If nn is dimensionality, then the SG will include all points that satisfy: nn kk nn + μμ wwwwwwwwww kk = nn kk ii ii=11 Example: if nn = 2, μμ = 1 2 kk 1 + kk 2 3 kk 1 = 1, kk 2 = 1, kk 1 = 1, kk 2 = 2, kk 1 = 2, kk 2 = 1, or else: xx 1 = 0,0, xx 2 = 0,1, xx 3 = 0, 1, xx 4 = 1,0, xx 5 = 1,0 Each point corresponds to a Chebyshev basis function: 0,0 (1) 1,0 xx 1 1,0 2xx 2 1 1 0, 1 xx 2 0,1 2xx 2 2 1 max nn,μμ+1 kk nn+μμ ff ssss μμ xx 1, xx 2,, xx nn = 1 nn+μμ kk nn 1 nn + μμ kk pp kk (xx 1, xx 2,, xx nn )

Motivating example: Branin function using SG Real function: ff xx 1, xx 2 = xx 2 5.1 4ππ 2 xx 1 2 + 5 ππ xx 1 6 2 + 10 1 1 8ππ ccccccxx 1 + 10 Smolyak Grids at levels 2,3 and 4 Surrogate function: ff ssss xx = bb 0 + bb 1 xx 1 + bb 2 2xx 1 2 1 + bb 3 xx 2 + bb 4 2xx 2 2 1 +bb 5 4xx 1 3 3xx 1 + bb 6 8xx 1 4 8xx 1 2 + 1 + bb 7 xx 1 xx 2 + bb 8 2xx 1 2 xx 2 xx 2

Algorithmic requirements Surrogate function (map) should approach actual function as more samples added Sampling is restricted to points that fall on the Smolyak grid Samples should be added in a hierarchical fashion Points associated with lower order polynomial terms need to be added before higher order terms Given the current set of collected samples there exists a finite set of candidate samples Optimization at every iteration isn t necessary Note: For a given approximation level, the only way to add points that do not fall on the current Smolyak grid is to change the bounds

Algorithm outline Generate initial grid Evaluate black-box function Fit surrogate function Has minimum region size been reached? No Does the function improve? Yes No STOP Yes Select top points Rank candidate points Update bounds Evaluate black-box function Optimize surrogate function

Adaptive surrogate construction (ii=1) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 ( 1 + 2xx 1 2 ) + bb 4 xx 2 + bb 5 ( 1 + 2xx 2 2 )

Adaptive surrogate construction (ii=2) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 +bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 + bb 8 1 8xx 1 2 + 8xx 1 4

Adaptive surrogate construction (ii=3) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 +bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 + bb 8 1 8xx 1 2 + 8xx 1 4 +bb 9 3xx 1 + 4xx 1 3 + b 10 x 1 x 2 + bb 11 7xx 1 + 56xx 1 3 112xx 1 5 + 64xx 1 7

Adaptive surrogate construction (ii=4) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 + bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 +bb 8 1 8xx 1 2 + 8xx 1 4 + bb 9 3xx 1 + 4xx 1 3 + b 10 x 1 x 2 + bb 11 7xx 1 + 56xx 1 3 112xx 1 5 + 64xx 1 7 +bb 12 1 + 18xx 1 2 48xx 1 4 + 32xx 1 6 + bb 13 xx 1 1 + 2xx 2 2

Adaptive surrogate construction (ii=5) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 + bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 + bb 8 1 8xx 1 2 + 8xx 1 4 + bb 9 3xx 1 + 4xx 1 3 + b 10 x 1 x 2 + bb 11 7xx 1 + 56xx 1 3 112xx 1 5 + 64xx 1 7 + bb 12 1 + 18xx 1 2

Accuracy of surrogate after fist iteration Smolyak approximation Absolute Error Max Error: ~2.9

Benchmark 180 benchmark problems with 2 20 variables are used to test algorithmic performance, because solution is known We evaluate the performance by number of problems solved Relative error or absolute error to optimal solution within 1% How much resources required to solve Computational time Number of samples We compare results with our previously developed method, ARGONAUT 1-2 (Algorithms for Global Optimization of Grey- Box Computational Systems): ARGONAUT was developed for solving simulation-based optimization problems with many constraints 1. Boukouvala, Hasan & Floudas, JOGO, 2015. 2. Boukouvala & Floudas, OPTL, 2016.

Problems of 2 or 3 variables (N=95) New approach solves ~5% more problems, requiring more samples but less computational time SGO (this work) ARGONAUT

Performance can be tuned All 95 problems of 2 or 3 variables are solved by at least one run below N initial samples: SGO 1 < SGO 2 < SGO 3 ARGONAUT Dashed lines utilize slower and tighter bound reduction

Good solutions found before convergence SGO 1 SGO 2 Dashed lines: # of samples when algorithm converges Solid lines: # of samples when problem first solved

How about problems of higher dimensions? Problems of 4-9 variables (N = 73) SGO (this work) ARGONAUT 12 problems of 10-20 variables were also tested, and all 12 were solved when starting with an approximation level of 1 or 2

Continuing the legacy Push the boundaries Very high dimensional problems (500+ variables) Extremely expensive simulations or experiments Guarantee optimality for black-box problems Use known theoretical bounds of approximation error to develop rigorous global optimization algorithms Apply, apply, apply Utilize the new algorithm for many potential applications ranging from process systems engineering to computational biology Make it available to the community Develop software

Working with Chris A Floudas Nothing seems impossible anymore Met, worked with & became friends with an extraordinary group of people 2013, Princeton, NJ 2015, AIChE Atlanta, GA 2016, The Woodlands, TX Lead by example Think BIG Loyalty to his people beyond anything else

Nights in the office & the Princeton Library Princeton Library, Sept 17 th 2014 Chris desk, Oct 2 nd 2014 Original Smolyak Paper in Russian