New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology, Tehran, Iran; z akbari@dena.kntu.ac.ir 2 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir 3 Department of Mathematics, K.N. Toosi University of Technology, Tehran, Iran; peyghami@kntu.ac.ir Abstract In this paper, a local model is presented for the locally Lipschitz functions. This local model is constructed by an approximation of the steepest descent direction. The steepest descent direction is an element of ǫ-subdifferential with minimal norm. In fact in the quadratic model, gradient is replaced by an approximation of the steepest descent direction. The classical trust region method is applied on this model. We prove that this algorithm is convergent by using the bounded positive definite matrices. The positive definite matrix is updated in each iterations by the BFGS method. Finally, the presented algorithm is implemented by MATLAB code. Keywords: Trust region, Lipschitz functions, Local model, Steihaug method Introduction The nonsmooth unconstraint minimization problem is one of the important problems in the real world. For example in smooth case, the penalty and lagrangian functions are nonsmooth optimization problems. Also, these problems are used in control optimization. Therefore, solving these problems are attended. The trust region (TR) method is an iterative method. In this method, the objective function is trusted by a local model. In each iteration, the model is reduced instead of objective function in the adequate region. If 1
f : R n R is continuously differentiable, then the local model is defined as follows m(x k,b k )(p) = f(x k )+ f(x k ) T p+1/2p T B k p, (1) where B k is adequately selected. If f is twice continuously differentiable, then B k is the Hessian matrix. In some methods, B k is updated by the Quasi-Newton methods. A local method, that can be practically implemented on the general local functions, is not presented. In this paper, we use the steepest descent direction to construct the local model. The steepest descent direction for the locally Lipschitz functions is an element of the Goldstein subgradient with minimal norm. Based on the method, that approximate this direction, several bundle algorithms were developed [1-6]. The efficiency of these algorithms depends on the approximation accuracy. To improve the efficiency of an algorithm, a larger number of subgradients must be computed to approximate the Goldstein subgradient efficiency and, this is time consuming. For example, in [6], the steepest descent direction is approximated by sampling gradients. This approximation is appropriate, but computing this approximation for large scale problems is very expensive. In [4], the steepest descent direction is iteratively approximated. This method computes a good approximation for the steepest descent direction by the less number of subgradients. The numerical results showed that this algorithm is more efficient than other bundle methods. By an approximation of the steepest descent direction, we propose an quadratic model for the locally Lipschitz functions. We combine the Cauchy point and CG-Steihaug methods [7] to approximate the quadratic model solution. The numerical results show that the TR algorithm has better behavior by this combination. In this paper, we implement this algorithm by Matlab code and compare its efficiency by other methods. The nonsmooth trust region algorithm and its convergence In [8], the local model for locally Lipschitz functions is given as follow m(x,p) = f(x)+φ(x,p)+ 1 2 pt Bp. (2) 2
Based on some assumption on φ(.,.), the global convergent of TR was proved. The authors purposed the following function φ(x,p) = max v f(x) < v,p >. But by this definition, minimization of the local model is impractical. In this paper, we give another local model for the locally Lipschitz functions. To construct the local model for the locally Lipschitz functions, we try to substitute the gradient in (1) by a suitable element of ǫ f(x). Let ǫ > 0, the steepest descent direction is computed by using ǫ f(x). Consider the following function v 0 = arg min v, (3) v ǫf(x) and let d 0 = v0 v 0. By Lebourg s Mean Value Theorem, there exists ξ ǫ f(x) such that f(x+d 0 ) f(x) = ǫξ T d 0 ǫv T 0 v 0 v 0 = ǫ v 0. In fact, d 0 is the steepest descent direction. But solving (3) is impractical, thus ǫ f(x) is approximated by its finite subset, i.e., if W ǫ f(x) then convw is considered an approximation of ǫ f(x). Consider the following problem v w = arg min v convw v, let d = vw v w. If f(x+ǫd) f(x) cǫ v w for some c (0,1), then d can be an approximation of a steepest descent direction. Else by adding a new element of ǫ f(x) in W, the approximation of ǫ f(x) is improved. The method, how construct such a subset, is described in [4]. Suppose that W k ǫ f(x k ) and conv W k is an approximation of ǫ f(x k ). We consider the following problem v k = arg min v conv W k v, and suppose that f(x k ǫ v k v k ) f(x) cǫ v k where c (0,1). In [4], an algorithm is presented for finding W k and v k. Based on this subdifferential, v k ǫ f(x k ), we define the following quadratic model: m(x k,p) = f(x k )+v T kp+ 1 2 pt B k p, 3
where B k is a positive definite matrix. Based on this quadratic model, the trust region method is presented as follows. Algorithm 1. (The nonsmooth trust region algorithm) Step 0: Let 0, 1 > 0, θ,δ 1,θ δ (0,1), x 1 R n, ξ 1 f(x), c 1,c 2,c 3 (0,1), c 4 > 1, B 1 = I and, k = 1. Step 1: Apply Algorithm 2 in [4] at point x k with parameters ǫ = k, δ = δ k and c = c 1. Suppose Algorithm 2 in [4] finds a proper approximation of ǫ f(x k ), convw k, and a adequate subgradient, v k, such that v k = arg min v convw k v. Step 2: If v k = 0, then stop, else if v k δ k, then set k+1 = θ k, δ k+1 = δ k θ δ, x k+1 = x k, k = k + 1 and go to Step 1. Else set δ k+1 = δ k and go to Step 3. Step 3: Solve the following quadratic subproblem: min p R nm(x k,p) = f(x k )+vkp+ T 1 2 pt B k p s.t. p k, and set p k be its solution. Step 4: If f(x k +p k ) f(x k ) c 1 v T k p k, then set x k+1 = x k +p k and go to Step 5, else set k+1 = θ k, x k+1 = x k, k = k+1 and go to Step 1. Step 5: Define the following ratio ρ k = f(x k +p k ) f(x k ). Q(p k ) Q(0) If ρ k c 3 and p k = k then, set k+1 = min{ 0,c 4 k } and, if ρ c 2 then, set k+1 = k θ. Else set k+1 = k. Step 6: Select a subgradient ξ k+1 f(x k+1 ), then update B k by the BFGS method. Set k = k +1 and go to Step 1. The following theorem proves the convergent of algorithm. 4
Theorem 1. Let f : R n R be a locally Lipschitz function. If the level set L := {x : f(x) f(x 1 )} is bounded, then either Algorithm 1 terminates finitely at some k 0 with v k0 = 0, or the sequence {x k }, generated by Algorithm 1, is convergent. If x = lim k x k, then 0 f(x ). REFERENCES 1. A. A. Goldstein. Optimization of Lipschitz continuous functions, Mathematical Programming, 13:14 22, (1977). 2. D. P. Bertsekas and S. K. Mitter. A descent numerical method for optimization problems with nondifferentiable cost functionals, SIAM Journal on Control, 11:637 652, (1973). 3. M. Gaudioso and M. F. Monaco. A bundle type approach to the unconstrained minimization of convex nonsmooth functions, Mathematical Programming, 23(2):216 226, (1982). 4. N. Mahdavi-Amiri and R. Yousefpour. An effective nonsmooth optimization algorithm for locally lipschitz functions, Accepted Journal of Optimization Theory Application. 5. P. Wolfe. A method of conjugate subgradients for minimizing nondifferentiable functions, Nondifferentiable Optimization, M. Balinski and P. Wolfe, eds., Mathematical Programming Study, North- Holland, Amsterdam, 3:145 173, (1975). 6. J. V. Burke, A. S. Lewis, and M. L. Overton. A robust gradient sampling algorithm for nonsmooth, nonconvex optimization, SIAM Journal of Optimization, 15:571 779, (2005). 7. J. Nocedal and S. J. Wright Numerical optimization, Springer, (1999). 8. L. Qi and J. Sun. A trust region algorithm for minimization of locally lipschitzian functions, Mathematical Programming, 66:25 43, (1994). 5