Data-Driven Global Optimization

Data-Driven Global Optimization Fani Boukouvala & Chris Kieslich Christodoulos A Floudas Memorial Symposium Princeton University May 6 th 2017

A story of global optimization & more September 17 th 2014 Princeton, NJ

Global Optimization using data Data-Driven optimization = Optimization without derivatives or without equations ff xx = 100 + 0.5xx 2 11xx dddd xx dddd Your attention during this talk! = xx 11 ddff ddxx =0 xx 11 = 0 xx = 11 Minimum attention at 11 minutes

Why would we not have ff xx? There are two reasons: 1. We are trying to optimize a system/process that is not understood well or cannot be described by an equation Perfom experiments: 2. We are trying to optimize a very complex system/ phenomenon/process that requires a simulation Designing an airplane Testing car during crash Designing a protein Designing a chemical plant

How do we optimize without equations? Inputs Black-Box or Grey-Box (simulation or experiment) Outputs Caballero & Grossmann, AICHE J., 54(10), 2008. Henao & Maravelias, AIChE J. 57(5), 2011. Boukouvala, Hasan & Floudas, JOGO, 2015. Boukouvala & Floudas, OPTL, 2016. Conn, Scheinberg, Vicente. SIAM, 2009. Rios & Sahinidis, JOGO, 56(3), 2013. Davis & Ierapetritou, IECR, 47(16), 2008. Jones et al., JOGO, 13(4), 1998

How would you find the deepest spot on the lake? An analogy for black box problems How would you find the volume of water in the lake?

Analogy for black-box problem: Collecting data x y Hand Lead-line Depth

Analogy for black-box problem: Making a map x y Hand Lead-line Depth

Analogy for black-box problem: Challenges for making the best map Where should we collect data? How many data-points do we need? Complexity of collecting each data-point (time, cost) If we have a function to represent the map, then for any x,y we can predict the depth ddddddddd = ff(xx, yy) Surrogate function

What makes a good surrogate function? Surrogate functions are models that are fitted/tuned to best predict the collected data Good characteristics: Accurate representation of black-box system: ff xx ff ssss (xx) 00 Simple function with less parameters Requires tractable number of data-points to be accuracte Many different types have been used: Surrogate Type Functional Form Quadratic ff ssss xx = bb 0 + bb 1 xx 1 + bb 2 xx 2 + bb 11 xx 1 2 + bb 12 xx 1 xx 2 + bb 22 xx 2 2 Kriging N ff ssss xx = μμ + cc ii eeeeee ii=1 2 θθ jj jj=1 xx jj ii xx jj 2 Radial Basis Function ff ssss xx = μμ + bb 1 xx 1 + bb 2 xx 2 + cc ii N ii=1 2 jj=1 xx jj ii xx jj 2 llll xxjj ii xx jj Boukouvala, F., R. Misener, and C.A. Floudas, European Journal of Operational Research, 2016. 252(3): p. 701-727

Adding more equidistant points is not necessarily better Intuition: More points better approximations Reality: Location of points coupled with type of surrogate function fitted are more important than quantity of points! Runge proved this in 1901 through this simple example( Runge Phenomenon ): ff xx = 1 1 + 25xx 2 1 x 1 10 15 20 46 points

What happens in high dimensional spaces? One of the biggest challenges in this work: Curse-of-Dimensionality nn (Dimension) Full tensor product grid with 4 levels (44 nn ) 2 16 5 1,024 10 1,048,576 20 1,099,511,627,776

Smolyak s theory for sparse grids In 1963 Smolyak proposed a method for Sparse multidimensional grids, as an optimized tensor product of unidimensional gridpoints Gridpoints are roots or extrema of various orthogonal polynomials Most commonly used are Chebyshev function extrema Chebyshev-based grid points are nested& non-equidistant SG used predominantly for calculating integrals 1 (Remember: What is the volume of water in lake?) Adding points to SG, improves surrogate approximation accuracy: ff xx ff ssss (xx) 00 1. Smolyak, S.A. Quadrature and interpolation formulas for tensor products of certain classes of functions. in Dokl. Akad. Nauk SSSR. 1963. 2. Barthelmann, V., E. Novak, and K. Ritter, Advances in Computational Mathematics, 2000. 12(4): p. 273-288. 3. Davis, P.J. and P. Rabinowitz, 2007: Courier Corporation. 4. Judd, K.L., et al.,. Journal of Economic Dynamics and Control, 2014. 44: p. 92-123.

Using Smolyak points to fit polynomial surrogate functions Step 1: Select level of approximation (μμ) (linked to polynomial exactness) Step 2 (Smolyak rule): If nn is dimensionality, then the SG will include all points that satisfy: nn kk nn + μμ wwwwwwwwww kk = nn kk ii ii=11 Example: if nn = 2, μμ = 1 2 kk 1 + kk 2 3 kk 1 = 1, kk 2 = 1, kk 1 = 1, kk 2 = 2, kk 1 = 2, kk 2 = 1, or else: xx 1 = 0,0, xx 2 = 0,1, xx 3 = 0, 1, xx 4 = 1,0, xx 5 = 1,0 Each point corresponds to a Chebyshev basis function: 0,0 (1) 1,0 xx 1 1,0 2xx 2 1 1 0, 1 xx 2 0,1 2xx 2 2 1 max nn,μμ+1 kk nn+μμ ff ssss μμ xx 1, xx 2,, xx nn = 1 nn+μμ kk nn 1 nn + μμ kk pp kk (xx 1, xx 2,, xx nn )

Motivating example: Branin function using SG Real function: ff xx 1, xx 2 = xx 2 5.1 4ππ 2 xx 1 2 + 5 ππ xx 1 6 2 + 10 1 1 8ππ ccccccxx 1 + 10 Smolyak Grids at levels 2,3 and 4 Surrogate function: ff ssss xx = bb 0 + bb 1 xx 1 + bb 2 2xx 1 2 1 + bb 3 xx 2 + bb 4 2xx 2 2 1 +bb 5 4xx 1 3 3xx 1 + bb 6 8xx 1 4 8xx 1 2 + 1 + bb 7 xx 1 xx 2 + bb 8 2xx 1 2 xx 2 xx 2

Algorithmic requirements Surrogate function (map) should approach actual function as more samples added Sampling is restricted to points that fall on the Smolyak grid Samples should be added in a hierarchical fashion Points associated with lower order polynomial terms need to be added before higher order terms Given the current set of collected samples there exists a finite set of candidate samples Optimization at every iteration isn t necessary Note: For a given approximation level, the only way to add points that do not fall on the current Smolyak grid is to change the bounds

Algorithm outline Generate initial grid Evaluate black-box function Fit surrogate function Has minimum region size been reached? No Does the function improve? Yes No STOP Yes Select top points Rank candidate points Update bounds Evaluate black-box function Optimize surrogate function

Adaptive surrogate construction (ii=1) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 ( 1 + 2xx 1 2 ) + bb 4 xx 2 + bb 5 ( 1 + 2xx 2 2 )

Adaptive surrogate construction (ii=2) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 +bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 + bb 8 1 8xx 1 2 + 8xx 1 4

Adaptive surrogate construction (ii=3) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 +bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 + bb 8 1 8xx 1 2 + 8xx 1 4 +bb 9 3xx 1 + 4xx 1 3 + b 10 x 1 x 2 + bb 11 7xx 1 + 56xx 1 3 112xx 1 5 + 64xx 1 7

Adaptive surrogate construction (ii=4) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 + bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 +bb 8 1 8xx 1 2 + 8xx 1 4 + bb 9 3xx 1 + 4xx 1 3 + b 10 x 1 x 2 + bb 11 7xx 1 + 56xx 1 3 112xx 1 5 + 64xx 1 7 +bb 12 1 + 18xx 1 2 48xx 1 4 + 32xx 1 6 + bb 13 xx 1 1 + 2xx 2 2

Adaptive surrogate construction (ii=5) ff xx 1, xx 2 = bb 1 + bb 2 xx 1 + bb 3 1 + 2xx 1 2 + bb 4 xx 2 + bb 5 1 + 2xx 2 2 + bb 6 1 + 2xx 2 2 xx 2 + bb 7 3xx 2 + 4xx 2 3 + bb 8 1 8xx 1 2 + 8xx 1 4 + bb 9 3xx 1 + 4xx 1 3 + b 10 x 1 x 2 + bb 11 7xx 1 + 56xx 1 3 112xx 1 5 + 64xx 1 7 + bb 12 1 + 18xx 1 2

Accuracy of surrogate after fist iteration Smolyak approximation Absolute Error Max Error: ~2.9

Benchmark 180 benchmark problems with 2 20 variables are used to test algorithmic performance, because solution is known We evaluate the performance by number of problems solved Relative error or absolute error to optimal solution within 1% How much resources required to solve Computational time Number of samples We compare results with our previously developed method, ARGONAUT 1-2 (Algorithms for Global Optimization of Grey- Box Computational Systems): ARGONAUT was developed for solving simulation-based optimization problems with many constraints 1. Boukouvala, Hasan & Floudas, JOGO, 2015. 2. Boukouvala & Floudas, OPTL, 2016.

Problems of 2 or 3 variables (N=95) New approach solves ~5% more problems, requiring more samples but less computational time SGO (this work) ARGONAUT

Performance can be tuned All 95 problems of 2 or 3 variables are solved by at least one run below N initial samples: SGO 1 < SGO 2 < SGO 3 ARGONAUT Dashed lines utilize slower and tighter bound reduction

Good solutions found before convergence SGO 1 SGO 2 Dashed lines: # of samples when algorithm converges Solid lines: # of samples when problem first solved

How about problems of higher dimensions? Problems of 4-9 variables (N = 73) SGO (this work) ARGONAUT 12 problems of 10-20 variables were also tested, and all 12 were solved when starting with an approximation level of 1 or 2

Continuing the legacy Push the boundaries Very high dimensional problems (500+ variables) Extremely expensive simulations or experiments Guarantee optimality for black-box problems Use known theoretical bounds of approximation error to develop rigorous global optimization algorithms Apply, apply, apply Utilize the new algorithm for many potential applications ranging from process systems engineering to computational biology Make it available to the community Develop software

Working with Chris A Floudas Nothing seems impossible anymore Met, worked with & became friends with an extraordinary group of people 2013, Princeton, NJ 2015, AIChE Atlanta, GA 2016, The Woodlands, TX Lead by example Think BIG Loyalty to his people beyond anything else

Nights in the office & the Princeton Library Princeton Library, Sept 17 th 2014 Chris desk, Oct 2 nd 2014 Original Smolyak Paper in Russian