Some Theory Behind Algorithms for Stochastic Optimization

Sme Thery Behind Algrithms fr Stchastic Optimizatin Zelda Zabinsky University f Washingtn Industrial and Systems Engineering May 24, 2010 NSF Wrkshp n Simulatin Optimizatin

Overview Prblem frmulatin Theretical perfrmance f stchastic adaptive search methds Algrithms based n Hit-and-Run t apprximate theretical perfrmance Incrprate randm sampling and nisy bjective functins 2

What is Stchastic Optimizatin? Randmness in algrithm AND/OR in functin evaluatin Related terms: Simulatin ptimizatin Optimizatin via simulatin Randm search methds Stchastic apprximatin Stchastic prgramming Design f experiments Respnse surface ptimizatin 3

Prblem Frmulatin Minimize f(x) subject t x in S x: n variables, cntinuus and/r discrete f(x): bjective functin, culd be black-bx, ill-structured, nisy S: feasible set, nnlinear cnstraints, r membership racle Assume an ptimum x* exists, with y*=f(x*) 4

Example Prblem Frmulatins Maximize expected value subject t standard deviatin < b Minimize standard deviatin subject t expected value > t Minimize CVaR (cnditinal value at risk) Minimize sum f least squares frm data Maximize prbability f satisfying nisy cnstraints 5

Apprximate r Estimate f(x)? Apprximate a cmplicated functin: Taylr series expansin Finite element analysis Cmputatinal fluid dynamics Estimate a nisy functin with: Replicatins Length f discrete-event simulatin run 6

Nisy Objective Functin f (x) = E Ω [ g( x,ω )] x f (x) S x 7

Scenari-based Recurse Functin g y ( x,ω) c T y + E Ω ming y [ ( x,ω) ] Scenari I Scenari II Scenari III S 8 x

Lcal versus Glbal Optima f Lcal ptima are relative t the neighbrhd and algrithm (x) 0 0 S x 9

Lcal versus Glbal Optima f Lcal ptima are relative t the neighbrhd and algrithm (x) 0 0 S x 10

Research Questin: What D We Really Want? D we really just want the ptimum? What abut sensitivity? D we want t apprximate the entire surface? Multi-criteria? Rle f bjective functin and cnstraints? Where des randmness appear? 11

Hw can we slve? IDEAL Algrithm: Optimizes any functin quickly and accurately Prvides infrmatin n hw gd the slutin is Handles black-bx and/r nisy functins, with cntinuus and/r discrete variables Is easy t implement and use 12

Theretical Perfrmance f Stchastic Adaptive Search What kind f perfrmance can we hpe fr? Glbal ptimizatin prblems are NP-hard Tradeff between accuracy and cmputatin Sacrifice guarantee f ptimality fr speed in finding a gd slutin Three theretical cnstructs: Pure adaptive search (PAS) Hesitant adaptive search (HAS) Annealing adaptive search (AAS) 13

Perfrmance f Tw Simple Methds Grid Search: Number f grid pints is O((L/ε) n ), where L is the Lipschitz cnstant, n is the dimensin, and ε is distance t the ptimum Pure Randm Search: Expected number f pints is O(1/p(y*+ε)), where p(y*+ε) is the prbability f sampling within ε f the ptimum y* Cmplexity f bth is expnential in dimensin 14

Pure Adaptive Search (PAS) PAS: chses pints unifrmly distributed in imprving level sets 180 160 140 120 100 80 60 40 20 x2 x1 20 40 60 80 100 120 140 160 180 15

Bunds n Expected Number f Iteratins PAS (cntinuus): E[N(y*+ε)] < 1 + ln (1/p(y*+ε) ) where p(y*+ε) is the prbability f PRS sampling within ε f the glbal ptimum y* PAS (finite): E[N(y*)] < 1 + ln (1/p 1 ) where p 1 is the prbability f PRS sampling the glbal ptimum [Zabinsky and Smith, 1992] [Zabinsky, Wd, Steel and Baritmpa, 1995] 16

Pure Adaptive Search Theretically, PAS is LINEAR in dimensin Therem: Fr any glbal ptimizatin prblem in n dimensins, with Lipschitz cnstant at mst L, and cnvex feasible regin with diameter at mst D, the expected number f PAS pints t get within ε f the glbal ptimum is: E[N(y*+ε)] < 1 + n ln(ld / ε) [Zabinsky and Smith, 1992] 17

Finite PAS Analgus LINEARITY result Therem: Fr an n dimensinal lattice {1,,k} n, with distinct bjective functin values, the expected number f pints fr PAS, sampling unifrmly, t first reach the glbal ptimum is: E[N(y*)] < 2 + n ln(k) [Zabinsky, Wd, Steel and Baritmpa, 1995] 18

Hesitant Adaptive Search (HAS) What if we sample imprving level sets with bettering prbability b(y) and hesitate with prbability 1-b(y)? E[N(y*+ε)] = y*+ε dρ(t) b(t) p(t) where ρ(t) is the underlying sampling distributin and p(t) is the prbability f sampling t r better [Bulger and Wd, 1998] 19

General HAS Fr a mixed discrete and cntinuus glbal ptimizatin prblem, the expected value f N(y*+ε), the variance, and the cmplete distributin can be expressed using the sampling distributin ρ(t) and bettering prbabilities b(y) [Wd, Zabinsky and Kristinsdttir, 2001] 20

Annealing Adaptive Search (AAS) What if we sample frm the riginal feasible regin each iteratin, but change distributins? Generate pints ver the whle dmain using a Bltzmann distributin parameterized by temperature T Bltzmann distributin becmes mre cncentrated arund the glbal ptima as the temperature decreases Temperature is determined by a cling schedule The recrd values f AAS are dminated by PAS and thus LINEAR in dimensin [Rmeijn and Smith, 1994] 21

Perfrmance f Annealing Adaptive Search The expected number f sample pints f AAS is bunded by HAS with a specific b(y) Select the next temperature s that the prbability f generating an imprvement under that Bltzmann distributin is at least 1-α, i.e., P( Y AAS R( k)+1 < y Y AAS R(k ) = y) 1 α Then the expected number f AAS sample pints is LINEAR in dimensin [Shen, Kiatsupaibul, Zabinsky and Smith, 2007] 22

AAS with Adaptive Cling Schedule f(x) -0.1-0.6-1.1-1.6-2.1-2.6-3.1-3.6 0 1 2 3 f(x 1 ) f(x 2 ) f(x 4 ) f * *) (x f(x 3 ) f(x 0 ) x Bltzmann Densities 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 T X 1 2 = T1 = 1 X 2 X 4 * * x X 3 T=10 T=1 T=0.5 X 0 x 23

Research Areas Develp theretical analysis f PAS, HAS, AAS fr nisy r apprximate functins Mdel apprximatin r estimatin errr Characterize impact f errr n perfrmance Use thery t develp algrithms Apprximate sampling frm imprving sets (as PAS) r Bltzmann distributins (as AAS) Use HAS, with ρ(t) and b(y), t quantify and balance accuracy and efficiency 24

Randm Search Algrithms Instance-based methds Sequential randm search Multi-start and ppulatin-based algrithms Mdel-based methds Imprtance sampling Crss-entrpy [Rubinstein and Krese, 2004] Mdel reference adaptive search [Hu, Fu and Marcus, 2007] [Zlchin, Birattari, Meuleau and Drig, 2004] 25

Sequential Randm Search Stchastic apprximatin [Rbbins and Mnr, 1951] Step-size algrithms [Rastrigin, 1960] [Slis and Wets, 1981] Simulated annealing [Rmeijn and Smith, 1994], [Alrafaei and Andradttir, 1999] Tabu search [Glver and Kchenberger, 2003] Nested partitin [Shi and Olafssn, 2000] COMPASS [Hng and Nelsn, 2006] View these algrithms as Markv chains with Candidate pint generatrs Update prcedures 26

Use Hit-and-Run t Apprximate AAS Hit-and-Run is a Markv chain Mnte Carl (MCMC) sampler cnverges t a unifrm distributin [Smith, 1984] in plynmial time O(n 3 ) [Lvász, 1999] can apprximate any arbitrary distributin by using a filter The difficulty f implementing AAS is t generate pints directly frm a family f Bltzmann distributins 27

Hit-and-Run Hit-and-Run generates a randm directin (unifrmly distributed n a hypersphere) and a randm pint (unifrmly distributed n the line) X 2 X 1 X 3 28

Imprving Hit-and-Run IHR: chse a randm directin and a randm pint, accept nly imprving pints 180 160 140 120 100 80 60 40 20 x2 x1 20 40 60 80 100 120 140 160 180 29

Imprving Hit-and-Run IHR: chse a randm directin and a randm pint, accept nly imprving pints 180 160 140 120 100 80 60 40 20 x2 x1 20 40 60 80 100 120 140 160 180 30

Imprving Hit-and-Run IHR: chse a randm directin and a randm pint, accept nly imprving pints 180 160 140 120 100 80 60 40 20 x2 x1 20 40 60 80 100 120 140 160 180 31

Is IHR Efficient in Dimensin? Therem: Fr any elliptical prgram in n dimensins, the expected number f functin evaluatins fr IHR is: O(n 5/2 ) [Zabinsky, Smith, McDnald, Rmeijn and Kaufman, 1993] f(x 1,x 2 ) x 2 x 1 32

Use Hit-and-Run t Apprximate Annealing Adaptive Search Hide-and-Seek: add a prbabilistic Metrplis acceptance-rejectin criterin t Hit-and-Run t apprximate the Bltzmann distributin [Rmeijn and Smith, 1994] Cnverges in prbability with almst any cling schedule driving temperature t zer AAS Adaptive Cling Schedule: Temperature values accrding t AAS t maintain 1-α prbability f imprvement Update temperature when recrd values are btained [Shen, Kiatsupaibul, Zabinsky and Smith, 2007] 34

Research Pssibilities: Hw lng shuld we execute Hit-and-Run at a fixed temperature? What is the benefit f sequential temperatures (warm starts) n cnvergence rate? Hit-and-Run has fast cnvergence n well-runded sets; hw can we mdify transitin kernel in general? Incrprate new Hit-and-Run n mixed integer/cntinuus sets Discrete hit-and-run [Baumert, Ghate, Kiatsupaibul, Shen, Smith and Zabinsky, 2009] Pattern hit-and-run [Mete, Shen, Zabinsky, Kiatsupaibul and Smith, 2010] 35

Simulated Annealing with Multi-start: When t Stp r Restart a Run? Use HAS t mdel prgress f a heuristic randm search algrithm and estimate assciated parameters Dynamic Multi-start Sequential Search If current run appears stuck accrding t HAS analysis, stp and restart Estimate prbability f achieving y*+ε based n bserved values and estimated parameters If prbability is high enugh, terminate [Zabinsky, Bulger and Khmpatraprn, 2010] 36

Meta-cntrl f Interacting-Particle Algrithm Interacting-Particle Algrithm Cmbines simulated annealing and ppulatin based algrithms Uses statistical physics and Feynman-Kac frmulas t develp selectin prbabilities [Del Mral, Feynman-Kac Frmulae: Genealgical and Interacting Particle Systems with Applicatins, 2004] Meta-cntrl apprach t dynamically heat and cl temperature [Khn, Zabinsky and Brayman, 2006] [Mlvaliglu, Zabinsky and Khn, 2009] 37

Meta-cntrl Apprach f(x), S initial parameters Interacting-Particle Algrithm N-Particle Explratin current infrmatin Cntrl System State Estimatin and System Dynamics Perfrmance Criterin N-Particle Selectin ptimal parameter Optimal Cntrl Prblem 38

Research Pssibilities Cmbine theretical analyses with MCMC and metacntrl t: Cntrl the explratin transitin prbabilities Obtain stpping criterin and quality f slutin Relate interacting particles t clning/splitting Cmbine theretical analyses and meta-cntrl with mdel-based apprach 39

Anther Research Area: Quantum Glbal Optimizatin Grver s Adaptive Search can implement PAS n a quantum cmputer [Baritmpa, Bulger and Wd, 2005] Apply research n quantum cntrl thery t glbal ptimizatin [Gardiner, Handbk f Stchastic Methds fr Physics, Chemistry and the Natural Sciences, 2004] [Del Mral, Feynman-Kac Frmulae: Genealgical and Interacting Particle Systems with Applicatins, 2004] 40

Optimizatin f Nisy Functins Use randm sampling t explre the feasible regin and estimate the bjective functin with replicatins Recgnize tw surces f nise: Randmness in the sampling distributin Randmness in the bjective functin Adaptively adjust the number f samples and the number f replicatins 41

Nisy Objective Functin f (x) S x 42

Nisy Objective Functin f (x) y(δ,s) δ L(δ,S) S x 43

Prbabilistic Branch-and-Bund (PBnB) f (x) σ σ 1 2 S x 44

Sample N * Unifrm Randm Pints with R * Replicatins f (x) σ σ 1 2 S x 45

Use Order Statistics t Assess Range Distributin f (x) σ σ 1 2 S x 46

Prune, if Statistically Cnfident f (x) σ σ 1 2 S x 47

Subdivide & Sample Additinal Pints f (x) σ 11 σ σ 12 2 S x 48

Reassess Range Distributin f (x) σ 11 σ σ 12 2 S x 49

If N Pruning, Then Cntinue f (x) σ 11 σ σ 12 2 S x 50

PBnB: Numerical Example 51

Research Areas Develp thery t tradeff accuracy with cmputatinal effrt Use thery t develp algrithms that give insight int riginal prblem Glbal ptima and sensitivity Shape f the functin r range distributin Use interdisciplinary appraches t incrprate feedback 52

Summary Theretical analysis f PAS, HAS, AAS mtivates randm search algrithms Hit-and-Run is a MCMC methd t apprximate theretical perfrmance Meta-cntrl thery allws adaptatin based n bservatins Prbabilistic Branch-and-Bund incrprates nisy functin evaluatins and sampling nise int analysis 53

Additinal Slides fr Details n Interacting Particle Algrithm 54

Interacting-Particle Algrithm Simulated Annealing: Markv chain Mnte Carl methd fr apprximating a sequence f Bltzmann distributins η t (dx) = S e f (x)/t t e f (y)/t t dy dx Ppulatin-based Algrithms: simulate a distributin (e.g. Feynman-Kac annealing mdel)such that E η t ( f ) y * as t 55

Interacting-Particle Algrithm Initializatin: Sample the initial lcatins fr particle i=1,,n frm the distributin Fr t=0,1,., N-particle explratin: Mve particle i=1,,n i i t lcatin with prbability distributin E( y, dyˆ Temperature Parameter Update: T t = (1+ ε t )T t 1, 1 N-particle selectin: Set the particles lcatins k y t +1 ŷ t y k t, i 1,..., +1 = ε t N = ˆ y t i with prbability e f ( ˆ y t i )/T t N e f ( y ˆ tj )/T t j=1 i y 0 η 0 t i t 56 )

Illustratin f Interacting-Particle Algrithm 5 ŷ 0 y 1 0 3 ŷ 0 2 4 y 3 0 ŷ 0 y 0 Initializatin: Sample N=10 pints unifrmly n S N-particle explratin: ŷ 1 0 4 y 0 Mve particles frm t using Markv kernel E 5 y 0 6 y 0 9 ŷ 0 7 y 0 10 ŷ 0 2 ŷ 0 7 ŷ 0 8 y 0 8 9 ŷ y 0 0 6 ŷ 0 10 y 0 S 57

Illustratin f Interacting-Particle Algrithm 5 ŷ 0 f yˆ ( 5 0 ) 3 ŷ 0 f ( ˆ 10 ) y 0 4 ŷ 0 f ( yˆ 0 4 ) Initializatin: Sample N=10 pints unifrmly n S N-particle explratin: Mve particles frm t 10 ŷ 0 f ( ˆ 10 ) y 0 ŷ 1 0 2 ŷ 0 f yˆ ( 1 0 f ( yˆ 0 2 ) ) 9 ŷ 0 f ( yˆ 0 9 ) 7 ŷ 0 f ( yˆ 0 7 ) using Markv kernel E Evaluate Lwer f (.) f (.) Higher f (.) 8 ŷ0 6 ŷ 0 ( yˆ 0 8 ) f f yˆ ) ( 6 0 S 58

Illustratin f Interacting-Particle Algrithm 5 ŷ 0 10 ŷ 0 f yˆ ( 5 0 ) f ( ˆ 10 ) y 0 3 ŷ 0 ŷ 1 0 2 ŷ 0 f ( ˆ 10 ) y 0 f yˆ ( 1 0 f ( yˆ 0 2 ) ) 9 ŷ 0 4 ŷ 0 f ( yˆ 0 9 8 ŷ0 6 ŷ 0 ( yˆ 0 8 ) f ( yˆ 0 4 7 ŷ 0 f f yˆ ) ) ( 6 0 ) S Initializatin: Sample N=10 pints unifrmly n S N-particle explratin: Mve particles frm using Markv kernel E Evaluate Lwer f (.) f (.) t f ( yˆ 0 7 ) N-particle selectin: Mve particles frm Higher f (.) t accrding t their bjective functin values 59

Illustratin f Interacting-Particle Algrithm y 1 3 1 y1,, y 9 1 4 y 1 Initializatin: Sample N=10 pints unifrmly n S N-particle explratin: Mve particles frm using Markv kernel E Evaluate f (.) t Lwer f (.) Higher f (.) y 2 5 8 1 y1, y1,, y 6 y 1, y 7 1 10 1 S N-particle selectin: Mve particles frm t accrding t their bjective functin values 60

Multi-start r Ppulatin-based Algrithms Multi-start and clustering algrithms [Rinny Kan and Timmer, 1987] [Lcatelli and Schen, 1999] Genetic algrithms [Davis, 1991] Evlutinary prgramming [Bäck, Fgel and Michalewicz, 1997] Particle swarm ptimizatin [Kennedy, Eberhart and Shi, 2001] Interacting particle algrithm [del Mral, 2004] 61