ON THE EFFICIENCY OF A RANDOM SEARCH METHOD

Size: px

Start display at page:

Download "ON THE EFFICIENCY OF A RANDOM SEARCH METHOD"

Arthur Bennett
6 years ago
Views:

1 ON THE EFFICIENCY OF A RANDOM SEARCH METHOD G. Pérez-Lechuga 1, J.C.S. Tuoh-Mora 1, E. Morales-Sánchez 1, M. M. Álvarez Suárez1, 1 Centro de Investigación Avanzada en Ingeniería Industrial Universidad Autónoma del Estado de Hidalgo ICBI, CIAII, Carretera Pachuca-Tulancingo Km.4.5, Pachuca Hgo, México. Abstract:- This paper studies the efficiency of the random search reported by Rubinstein [1] and widely studied by Pérez-Lechuga [2]. We proof that the efficiency of the selected random search algorithm is a linear function both of the step size and the direction of the descent movement. We report the theoretical results. Key Words:- Stochastic optimization, random search, stochastic approximation, gradient approximation, quasigradient method. 1. Introduction Stochastic approximation algorithms can be used in system optimization problems for which only noisy measurements of the system are available without knowing the gradient of the objective function. This type of problem can be found in adaptative control, neural network training,experimental design, stochastic optimization and many other areas. The main idea of the stochastic quasigradient methods is to solve a wide class of optimization problems with a complex nature of objective functions and constraints. These methods are stochastic algorithmic procedures for solving general constrained problems with nondifferentiable, nonconvex functions. For stochastic programming problems, these techniques generalize the well known stochastic approximation method for unconstrained optimization of the expectation of a random function to problems involving general contraints. Consider the general stochastic programming problem Minimize F (x) = E [f (x, ω)], (1) Subject to x S R n, where S = {x f i (x, ω), i = 1,..., m}, (2) E is the operation of mathematical expectation with respect of some probability 1

2 space (Ω, A, P), and ω Ω. The more trouble on solving the problem (1) to (2) is that, it is only feasible to calculate the exact values of the functions F i (x) = E [f i (x, ω)] = f i (x, ω)p(dω), i =,..., m in exceptional cases for special types of probability measures P(ω). If the functions F i (x) have uniformly bounded second derivatives at x {x s } s= then for the random vectors ξ i (s) defined as (see [3]) n j=1 f i (x s + s e j, ω sj ) f i (x s, ω s ) s e j, (3) we would have E[ξ i (s) x s ] = F i (x s ) + b i (s), b i (s) s. Where e j is the unit vector on the jth axis and s is a positive constant. The random sequence x s+1 = x s ρ s ξ s, s =, 1,... converges with probability 1 to the solution of (1) if the following conditions are satisfied with probability 1. For the step size: ρ s, s ρ s =, s E [ρ s s +ρ 2 s] <. For the quasigradient ξ s, E [ξ s x s ] = F (x s ) + o( s ). 2. The random search algorithms In this section we introduce the random search algorithms for optimization problems, in which the computing cost of the random search at a point increases as the point tends to satisfy appropriate optimality conditions. The algorithms progress faster beginning from initial conditions far away from an optimal or suboptimal point, and they gain precision with expense of efficiency as such a point is approached. The algorithms start at some x S, and they generate a sequence x, x 1,..., S. These are descent algorithms, in the sense that the sequences F (x ), F (x 1 )..., are monotone decreasing. They generate x i+1 from x i by random search techniques, using the sufficient descent principle. This principle enables that the algorithms do not adopt the first point y R n found by random search satisfying F (y) < F (x i ) as x i, but they rather wait to find a point y for which F (x i ) F (y) is large by some criteria [4]. The amount of descent F (x i+1 ) F (x i ), like the amount of time the algorithm spent for the random search at x i, depends on the desired extent for x i ; the less desirable x i, the larger the descent will tend to be. Random search techniques have been an object of research for quite some time. The concept has been initially introduced by Anderson [5] and then developed by Rastrigin [6]. The idea is to determine a descent direction at random, by using a distribution on the unit sphere around x i, and then, to find a suitable step size. The step size is used by minimizing F along the descent direction. This is determined adaptively, based on the ratio of successful to failed attempts (by random searches) to reduce F. The basic property of this algorithms is that x i reaches a solution of (1) with probability 1 using a prescribed tolerance as i. Thus in descent algorithms, one random search is conducted at x i to generate x i+1. For the convergence states, the probability of x i satisfying f(x i ) inf {f(x) x S} + ɛ (for a given ɛ > ) approaches 1 as i. 2

3 2.1. Stochastic Quasigradient Methods Stochastic quasigradient methods are a set of techniques useful to solve optimization problems with objective functions and constrains of such a complex nature which make impossible to calculate the precise values of these functions (let alone of their derivatives). The basic idea is to use statistical estimates for the values of the functions rather than precise values. For stochastic programming problems these methods generalize the well-known stochastic approximation method for unconstrained optimization of the expectation of a random function. This stochastic problem can be defined as follows. min{e ω f(x, ω) : x S}, (4) where x represents the variable to be chosen optimally, S is a set of constraints, and ω is a random variable belonging to some probabilistic space (Ω, B, P). Here B is a Borel field and P is a probabilistic measure. In this problem we assume that S is defined in such a way that the projection operation x Π S (x) is comparatively inexpensive from a computational point of view, where Π S (x) = argmin Z S x Z In this case it is possible to implement a stochastic quasigradient algorithm of the following type x i+1 = Π S (x i ρ i ϕ i ), (5) Here x i is the current approximation of the optimal solution, ρ i is the steep size, and ϕ i is a random step direction. This step direction may, for instance, be a statistical estimate of the gradient (or subgradient in the nondifferentiable case) of f(x), then ϕ i ξ i, such that E(ξ i x 1,... x i ) = F i (x i ) + a i, (6) i =,..., m where a i decreases for an increasing number of iterations, the vector ξ i is called a stochastic quasigradient of functions F i (x), and F (x) is the subgradient of F (x) in each point x i. Usually ρ i as i and therefore x i+1 x i. Algorithm (5) can also resolve with problems with more general constraints formulated in terms of mathematical expectations E ω [f i (x, ω) ], i = 1,..., m, by making use of penalty functions or Lagrangians. 3. Problem definition The idea of efficiency was introduced by Rubinstein et al. [1] as follows. Let x i+1 be the point reached after one single iteration, and F i = F i+1 F i the increment of the value of F. The efficiency of the random search algorithms is defined as C = E( F i) E(N i ), (7) D = C [Var f i ] 1/2, (8) where N i is the number of observations (measurements) of the convex function F (x) required for the algorithm at the ith step. In this paper we are interested in evaluate the efficiency of the following algorithm [2]: 3

4 x i+1 = x i α i γ i ξ i, (9) where α i is the step size, γ i is a normalization factor proposed by [1], ξ i = Υ imb il, and Υ il = mín{υ i1,..., Υ il } denotes the difference Υ il = f(x i + B il, W il ) f(x i, W i ), (1) l = 1,..., H here, W il and W i are the realizations observed from the noise at points x i and x i +B il respectively. Bil denotes the vector in which the minimum increment is produced, and H is the number of points generated on the surface of the n-dimensional unit hipersphere. Note that B il are independent and uniformly distributed vectors on the surface of such sphere. We assume that f(x, W ) = f(x) + W, where W N(, σ 2 ). From the convexity of f we have f(x i + x i ) f(x i ) x i, f(x i ) or equivalently f(x i+1 ) = f(x i + x i ) = f(x i ) + x i, f(x i ) + δ( x i ) = f(x i ) + x i f(x i ) cos θ + δ( x i ), where cos θ is the angle between the unit vectors x i and f(x i ), and δ( x i ) as x i. We analyze two cases. In the first, we consider the noise W =, and in the second, W N(, σ 2 ). First case: W =. From (9), note that x i = x i+1 x i = αγ i Υ ilb il, (11) Substituting (1) in (9) we obtain f(x i+1 ) = f(x i ) + α i γ i Υ il Bil f(x i ) cos θ + δ( x i ) Taking into account that B ig = 1, and for x i sufficiently small, then therefore f(x i+1 ) = f(x i ) + α i γ i Υ il cos θ, Thus, by (7) f i = α i γ i Υ il cos θ. C i = E [f(x i) f(x i+1 )] = E [ f i] H H = 1 α H iγ i Υ il E [cos θ] (12) where the probabilty density function of the random angle is given by (see [7]) ζ n (θ) = sin n 2 (θ) π sinn 2 (θ)dθ = n sin n 2 (θ), π/2 θ π/2, (13) where = Γ(n/2) π Γ[(n 1)/2], and Γ denotes the gamma function. If h i = α i γ i Υ il, then substituting (13) in (12) we have that E [ f i ] = h i 2h i π/2 cos(θ) ζ i (θ)dθ = cos θ sin n 2 (θ)dθ = 2h i n 1, therefore (7) takes the form E( f i) E(N i ) = 2h i H(n 1), (14) 4

5 Second case: W N(, σ 2 ). Consider the event { xi h x i+1 = i Bil, if Υ = min{υ il} x i, in other case (15) note that for any i iteration, Υ il is such that Υ il = ψ(x i + B il, W il ) ψ(x i, W i ) = f(x i + B il ) + W il f(x i ) W i = f(x i + B il ) f(x i ) (16) where W il and W i are the realizations of the noise observed at points x i + B il and x i. The probability of this event is (see [2]) P = 1 2 ( 1 + φ ( f )). where φ(y) = y 2π 1/2 e t2 dt. Thus P = 1 ( ( )) hi cos θ 1 + φ. 2 As in [1], let Q be the random variable defined by Q = { cos θ, with probability (P) cos θ with probability (1 P) for π/2 θ π/2. Then, taking the mathematical expectation in Q we obtain E [Q] = π/2 n P [cos θ sin n 2 (θ)]dθ π/2 n (1 P) [cos θ sin n 2 (θ)]dθ = 2 P [cos θ sin n 2 (θ)]dθ 2 (1 P) [cos θ sin n 2 (θ)]dθ. After some algebraic manipulations, wehave E [Q] π/2 = cos θ sin n 2 (θ) φ 2 n ( fi ) dθ = ( ) cos θ sin n 2 hi cos θ (θ) φ dθ. (17) Then, C i takes the form 2 π/2 cos θ sin n 2 (θ)φ H ( hi cos θ ) dθ. (18) In the final analysis we let us estimate the variance of f i from E[Q 2 ] [E[Q]] 2. Note that 2 E[Q 2 ] = 2 2 P cos 2 θ sen n 2 (θ)dθ (1 P) cos 2 θ sen n 2 (θ)dθ = ( hi cos θ φ Since, for x small, ) cos 2 θ sen n 2 (θ)dθ φ(x) (x x3 +...), (19) 2π 2 3 then E[Q] can be written as [ 1 n 1 + h i σ 2π cos 2 θ sen n 2 (θ)dθ and E[Q 2 ] is defined by [ sen n+1 (θ)dθ + n h i σ 2π cos 3 θ sen n 1 (θ)dθ ], ] 5

6 thus (7) takes the form h i n Hσ 2π C n = n H(n 1) + cos 2 θ sen n 2 (θ)dθ. (2) Finally, and using (19), (7) can be defined as D n = H(n 1)Var[Q] + h i cos 2 θ sen n 2 (θ)dθ. (21) H(n 1)Var[Q] Where (2) and (21) can be written in the linear form C n = a n + b n h i, D n = a n + b nh i, (22) where a n = b n = b n = h i n Hσ 2π H(n 1), and a n = H(n 1)Var[Q], cos 2 θ sen n 2 (θ)dθ, and R π/2 cos 2 θ sen n 2 (θ)dθ H(n 1)Var[Q]. Table 1 shows some values of C n. Here, ϑ n = cos 2 θ sin n 2 (θ)dθ. n n ϑ n C n h i h i h i E-3 + (1E-3) h i E-2 + (3.6E-4)h i Table 1: Efficiency of C n of the search can be viewed as a linear function of the h i parameter, and of the evaluated points on the surface of the unit sphere. Where h i = α i γ i Υ im. 5. References 1. Rubinstein, Y. and Samorodnitsky, G., Efficieny of the random search method, Mathematics and computers in simulation, XXIV, pp , Pérez, L. G., An algorithm for the stochastic optimization of some dynamic models, Engineering Doctor Dissertation, DEPFI, UNAM, Yu. M. Ermoliev., Stochastic quasigradient methods and their application to system optimization, Stochastics, Vol. 9, pp. 1-36, Wardi, Y., Random search algorithms with sufficient descent for minimization of functions, Mathematics of Operations Research, Vol 14, No. 2, pp , Anderson, R. L. Recent advances in finding best operating conditions, J. Amer. Statist, Assoc., 48, pp , Rastrigin, L. A. Extremal control by methods of random scanning. Automat. Remote Control, 21, pp , Reuben, Y.R., Generating random vectors uniformly distributed inside and on the surface of different regions, Euro., J.O.R., Vol. 1, pp.25-29, Conclusions For the algorithm presented, the efficiency 6

Stochastic Subgradient Method

Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)