Minimization by Random Search Techniques by Solis and Wets. and. an Intro to Sampling Methods

Size: px

Start display at page:

Download "Minimization by Random Search Techniques by Solis and Wets. and. an Intro to Sampling Methods"

Natalie Lawson
6 years ago
Views:

1 Minimization by Random Search Techniques by Solis and Wets and an Intro to Sampling Methods Presenter: Michele Aghassi October 27, 2003 This presentation is based on: Solis, F. J., and R. J-B. Wets. Minimization by Random Search Techniques. Mathematical Operations Research 6, 1981, pp

2 Recap of Past Sessions LP Kalai (1992, 1997) use randomized pivot rules Motwani and Raghavan (1995), Clarson (1998, 1995) solve on a randomsubset of constraints, recursively Dunagan and Vempala (2003): LP Feasibility (Ax 0, 0 0) Generate randomvectors and test for feasibility If not, try moving in deterministic (w.r.t. random vector already selected) direction to achieve feasibility NLP Storn and Price (1997): Unconstrained NLP Heuristic Select randomsubsets of solution population vectors Performaddition, subtraction, component swapping and test for obj func improvement 1

3 Motivation What about provably convergent algorithms for constrained NLPs? Randomsearch techniques first proposed in the 1950s pre-1981 proofs of convergence were highly specific and involved Solis and Wets, 1981: Can we give more general sufficient conditions for convergence, unifying the past results in the literature? Solis and Wets paper interesting more from a unifying theoretical standpoint Computational results of the paper relatively unimpressive 2

4 Outline Part I: Solis and Wets paper Motivation for using randomsearch Appropriate goals of randomsearch algorithms Conceptual Algorithm encompassing several concrete examples Sufficient conditions for global search convergence, and theorem Local search methods and sufficient conditions for convergence, and theorem Defining stopping criteria Some computational results Part II: Intro to Sampling Methods Traditional Methods Hit-and-run algorithm 3

5 Why Use Random Search Techniques? Let f : R n R, S R n. (P) min f (x) s.t. x S Function characteristics difficult to compute (e.g. gradients, etc.) Function is bumpy Need global minimum, but there are lots of local minima Limited computer memory 4

6 What is an Appropriate Goal? Problems Global min may not exist Finding min may require exhaustive examination (e.g. min occurs at point at which f singularly discontinuous) Response Definition 1. α is the Essential Infimum of f on S iff α = inf {t v(x S f(x) < t) > 0}, where v denotes n-dimensional volume or Lebesgue measure. Optimality region for P is given by ( for a given big M >0 {x S f(x) < α+ ɛ}, α finite R ɛ,m = {x S f(x) < M}, α =, 5

7 What is Random Search? Conceptual Algorithm: 1. Initialize: Find x 0 S. Set := 0 2. Generate ξ R n (random) from distribution µ Set x = D(x,ξ ).Choose µ +1.Set := + 1. Gotostep µ (A) = P x A x, x,...,x This captures both Local search = supp(µ ) is bounded and v(s supp(µ )) < v(s) Global search = supp(µ ) is such that v(s supp(µ )) = v(s) 6

8 Sufficient Conditions for Convergence (H1) D s.t. f(x ) =0 nonincreasing f(d(x,ξ)) f(x) ξ S = f(d(x,ξ)) min {f(x),f(ξ)} (H2) Zero probability of repeatedly missing any positive-volume subset of S. Y A S s.t. v(a) > 0, (1 µ (A)) = 0 i.e. sampling strategy given by µ cannot consistently ignore a part of S with positive volume (Global search methods satisfy (H2)) =0 7

9 Example Satisfying (H1) and (H2), I DuetoGaviano[2]. D(x,ξ ) = (1 λ )x + λ ξ where h λ = arg min f ((1 λ)x + λξ ) (1 λ)x + λξ λ [0,1] i S µ unif on n-dimsphere with center x and r 2diam(S). Why? (H1) satisfied since f (x ) nonincreasing by construction =0 (H2) satisfied because sphere contains S 8

10 Example Satisfying (H1) and (H2), II Due to Baba et al. [1]. ( ξ, ξ S and f(ξ ) < f(x ) D(x,ξ ) = x, o.w. µ N (x, I) Why? (H1) satisfied since f(x ) =0 nonincreasing by construction (H2) satisfied because S contained in support of N (x, I) 9

11 Global Search Convergence Theorem Theorem 1. Suppose f measurable, S R n measurable, (H1), (H2), and x =0 generated by the algorithm. Then lim P x R ɛ,m = 1 Proof. By (H1), x R ɛ,m = x l R ɛ,m, l < 1 Y ` P x S\R ɛ,m 1 µ l (R ɛ,m ) 1 Y ` P x R ɛ,m = 1 P x S\R ɛ,m 1 1 µ l (R ɛ,m ) where last equality follows from(h2). l=0 l=0 1 Y ` 1 lim P x R ɛ,m 1 lim 1 µ l (R ɛ,m ) =1, l=0 10

12 Local Search Methods Easy to find examples for which the algorithm will get trapped at local minimum Drastic sufficient conditions ensure convergence to optimality region, but are very difficult to verify For instance (H3) x 0 S L 0 = x S f (x) f (x 0 ) is compact and γ > 0 and η (0, 1] (possibly depending on x 0 ) s.t., and x L 0, `ˆ ˆ µ D(x,ξ) R ɛ,m dist(d(x,ξ),r ɛ,m ) < dist(x,r ɛ,m ) γ η. If f and S are nice, local search methods demonstrate better convergence behavior. 11

13 Example Satisfying (H3), I int(s) α R, S {x f(x) α} convex and compact Happens whenever f quasi-convex and either S compact or f has bounded level sets ξ chosen via uniformdistribution on hypersphere with center x and radius ρ 0 1 ρ is a function of x, x,..., x 1 and ξ 1,...,ξ 1 such that ρ = inf ρ > 0 ( ξ, ξ S D(x,ξ ) = x, o.w. Proof. L 0 compact convex since level sets are. R ɛ,m has nonempty interior since S does. can draw ball contained in interior of R ɛ,m. v(region I) Now tae γ = ρ and η = > 0 2 v(hypersphere with radius ρ) 12

14 Example Satisfying (H3) L 0 I R ε, M II ρ y z x ρ/2 ρ v(region II) v(region I) v(hypersphere with radius ρ ) > v(hypersphere with radius ρ) = η. 13

15 Local Search Convergence Theorem, I Theorem 2. Suppose f is a measurable function, S R n is a measurable, and (H1) and (H3) are satisfied. Let x be a sequence generated by the algorithm. Then, =0 lim P x R ɛ,m = 1. Proof. Let x 0 be the initial iterate used by the algorithm. By (H1), all future iterates in L 0 R ɛ,m. L 0 is compact. Therefore p Z s.t. γp > diam(l 0 ). l P x l+p R l ɛ,m, x R ɛ,m P x l+p R ɛ,m x R ɛ,m = ` P x l R ɛ,m l P x l+p R ɛ,m, x R ɛ,m l P x R ɛ,m, dist(x,r ɛ,m ) γ(p ( l)), = l,...,l + p) η p by repeated Bayes rule and (H3) 14

16 Local Search Convergence Theorem, II ` x p Claim: P R ɛ,m (1 η p ), {1, 2,... } By induction ` ( = 1) P x p R ɛ,m P x p R ɛ,m, x 0 R ɛ,m η p (Genl ) P x p R R ɛ,m x ( 1)p R ɛ,m P x ( 1)p ɛ,m = P x p R ɛ,m h i ` 1 P x p R x ( 1)p R ɛ,m 1 η p 1 ɛ,m `1 η p ` 1 η p 1 P x p+l R ɛ,m P x p R ɛ,m 1 ` 1 η p, l = 0, 1,...,p 1 15

17 Stopping Criteria So far, we gave a conceptual method for generating x such that f (x ) =0 essential inf plus buffer In practice, need stopping criterion Easy to give stopping criterion if have LB on µ (R ɛ,m ) (unrealistic) How to do this without nowing a priori essential inf or R ɛ,m? Has been shown that even if S compact and convex and f C 2, each step of alg leaves unsampled square region of nonzero measure, over which f can be redefined so that global min is in unsampled region search for a good stopping criterion seems doomed to fail 16

18 Rates of Convergence Measured by distributional characteristics of number of iters or function evals required to reach essential inf (e.g. mean) Solis and Wets tested 3 versions of the conceptual alg (1 local search, 2 global search) on various problems (constrained and unconstrained) They report results only for with stopping criterion x 10 3 min x x x R n Found that mean number of function evals required n. 17

19 Conclusion and Summary of Part I Why use randomsearch techniques? How to handle pathological cases? (essential infimum, optimality region) Conceptual Algorithmunifies past examples in the literature Global and local search methods Sufficient conditions for convergence and theorems Issue of stopping criteria Computational results 18

20 Part II: Traditional Sampling Methods Transformation method easier to generate Y than X, but well-behaved transformation between the two Acceptance-rejection method Generate a RV and subject it to a test (based on a second RV) in order to determine acceptance Marov-regression Generate random vector component-wise, using marginal distributions w.r.t. components generated already Impractical because complexity increases rapidly with dimension. 19

21 Part II: Approximate Sampling Methods Performbetter computationally (efficient) generates a sequence of points, whose limiting distribution is equal to target distribution Hit-and-Run: Generate randompoint in S, a bounded open subset of R d, according to some target distribution π. 1. Initialize: select starting point x 0 S. n := Randomly generate direction θ n in R d, according to distribution ν (corresponds to randomly generating a point on a unit sphere). n 3. Randomly select step size from λ n {λ x + λθ n S} according to distribution L(x n,θ n ) n+1 n 4. Set x := x + λ n θ n. n := n + 1. Repeat. e.g. generate point according to uniformdistribution on S: use all uniformdistributions 20

22 Further Reading References [1] Baba, N., T. Shoman, and Y. Sawaragi. A Modified Convergence Theorem for a Random Optimization Algorithm, Information Science, 13 (1977). [2] Gaviano, M. Some General Results on the Convergence of Random Search Algorithms in Minimization Problems. In Towards Global Optimization, eds. L. Dixon and G. Szegö. Amsterdam. [3] Solis, Francisco J. and Roger J.B. Wets. Minimization by Random Search Techniques, Mathematics of Operations Research, 6: (1981). [4] H.E. Romeijn, Global Optimization by Random Wal Sampling Methods, Thesis Publishers, Amsterdam,

5 Handling Constraints

5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest