ON THE EFFICIENCY OF A RANDOM SEARCH METHOD

Size: px
Start display at page:

Download "ON THE EFFICIENCY OF A RANDOM SEARCH METHOD"

Transcription

1 ON THE EFFICIENCY OF A RANDOM SEARCH METHOD G. Pérez-Lechuga 1, J.C.S. Tuoh-Mora 1, E. Morales-Sánchez 1, M. M. Álvarez Suárez1, 1 Centro de Investigación Avanzada en Ingeniería Industrial Universidad Autónoma del Estado de Hidalgo ICBI, CIAII, Carretera Pachuca-Tulancingo Km.4.5, Pachuca Hgo, México. Abstract:- This paper studies the efficiency of the random search reported by Rubinstein [1] and widely studied by Pérez-Lechuga [2]. We proof that the efficiency of the selected random search algorithm is a linear function both of the step size and the direction of the descent movement. We report the theoretical results. Key Words:- Stochastic optimization, random search, stochastic approximation, gradient approximation, quasigradient method. 1. Introduction Stochastic approximation algorithms can be used in system optimization problems for which only noisy measurements of the system are available without knowing the gradient of the objective function. This type of problem can be found in adaptative control, neural network training,experimental design, stochastic optimization and many other areas. The main idea of the stochastic quasigradient methods is to solve a wide class of optimization problems with a complex nature of objective functions and constraints. These methods are stochastic algorithmic procedures for solving general constrained problems with nondifferentiable, nonconvex functions. For stochastic programming problems, these techniques generalize the well known stochastic approximation method for unconstrained optimization of the expectation of a random function to problems involving general contraints. Consider the general stochastic programming problem Minimize F (x) = E [f (x, ω)], (1) Subject to x S R n, where S = {x f i (x, ω), i = 1,..., m}, (2) E is the operation of mathematical expectation with respect of some probability 1

2 space (Ω, A, P), and ω Ω. The more trouble on solving the problem (1) to (2) is that, it is only feasible to calculate the exact values of the functions F i (x) = E [f i (x, ω)] = f i (x, ω)p(dω), i =,..., m in exceptional cases for special types of probability measures P(ω). If the functions F i (x) have uniformly bounded second derivatives at x {x s } s= then for the random vectors ξ i (s) defined as (see [3]) n j=1 f i (x s + s e j, ω sj ) f i (x s, ω s ) s e j, (3) we would have E[ξ i (s) x s ] = F i (x s ) + b i (s), b i (s) s. Where e j is the unit vector on the jth axis and s is a positive constant. The random sequence x s+1 = x s ρ s ξ s, s =, 1,... converges with probability 1 to the solution of (1) if the following conditions are satisfied with probability 1. For the step size: ρ s, s ρ s =, s E [ρ s s +ρ 2 s] <. For the quasigradient ξ s, E [ξ s x s ] = F (x s ) + o( s ). 2. The random search algorithms In this section we introduce the random search algorithms for optimization problems, in which the computing cost of the random search at a point increases as the point tends to satisfy appropriate optimality conditions. The algorithms progress faster beginning from initial conditions far away from an optimal or suboptimal point, and they gain precision with expense of efficiency as such a point is approached. The algorithms start at some x S, and they generate a sequence x, x 1,..., S. These are descent algorithms, in the sense that the sequences F (x ), F (x 1 )..., are monotone decreasing. They generate x i+1 from x i by random search techniques, using the sufficient descent principle. This principle enables that the algorithms do not adopt the first point y R n found by random search satisfying F (y) < F (x i ) as x i, but they rather wait to find a point y for which F (x i ) F (y) is large by some criteria [4]. The amount of descent F (x i+1 ) F (x i ), like the amount of time the algorithm spent for the random search at x i, depends on the desired extent for x i ; the less desirable x i, the larger the descent will tend to be. Random search techniques have been an object of research for quite some time. The concept has been initially introduced by Anderson [5] and then developed by Rastrigin [6]. The idea is to determine a descent direction at random, by using a distribution on the unit sphere around x i, and then, to find a suitable step size. The step size is used by minimizing F along the descent direction. This is determined adaptively, based on the ratio of successful to failed attempts (by random searches) to reduce F. The basic property of this algorithms is that x i reaches a solution of (1) with probability 1 using a prescribed tolerance as i. Thus in descent algorithms, one random search is conducted at x i to generate x i+1. For the convergence states, the probability of x i satisfying f(x i ) inf {f(x) x S} + ɛ (for a given ɛ > ) approaches 1 as i. 2

3 2.1. Stochastic Quasigradient Methods Stochastic quasigradient methods are a set of techniques useful to solve optimization problems with objective functions and constrains of such a complex nature which make impossible to calculate the precise values of these functions (let alone of their derivatives). The basic idea is to use statistical estimates for the values of the functions rather than precise values. For stochastic programming problems these methods generalize the well-known stochastic approximation method for unconstrained optimization of the expectation of a random function. This stochastic problem can be defined as follows. min{e ω f(x, ω) : x S}, (4) where x represents the variable to be chosen optimally, S is a set of constraints, and ω is a random variable belonging to some probabilistic space (Ω, B, P). Here B is a Borel field and P is a probabilistic measure. In this problem we assume that S is defined in such a way that the projection operation x Π S (x) is comparatively inexpensive from a computational point of view, where Π S (x) = argmin Z S x Z In this case it is possible to implement a stochastic quasigradient algorithm of the following type x i+1 = Π S (x i ρ i ϕ i ), (5) Here x i is the current approximation of the optimal solution, ρ i is the steep size, and ϕ i is a random step direction. This step direction may, for instance, be a statistical estimate of the gradient (or subgradient in the nondifferentiable case) of f(x), then ϕ i ξ i, such that E(ξ i x 1,... x i ) = F i (x i ) + a i, (6) i =,..., m where a i decreases for an increasing number of iterations, the vector ξ i is called a stochastic quasigradient of functions F i (x), and F (x) is the subgradient of F (x) in each point x i. Usually ρ i as i and therefore x i+1 x i. Algorithm (5) can also resolve with problems with more general constraints formulated in terms of mathematical expectations E ω [f i (x, ω) ], i = 1,..., m, by making use of penalty functions or Lagrangians. 3. Problem definition The idea of efficiency was introduced by Rubinstein et al. [1] as follows. Let x i+1 be the point reached after one single iteration, and F i = F i+1 F i the increment of the value of F. The efficiency of the random search algorithms is defined as C = E( F i) E(N i ), (7) D = C [Var f i ] 1/2, (8) where N i is the number of observations (measurements) of the convex function F (x) required for the algorithm at the ith step. In this paper we are interested in evaluate the efficiency of the following algorithm [2]: 3

4 x i+1 = x i α i γ i ξ i, (9) where α i is the step size, γ i is a normalization factor proposed by [1], ξ i = Υ imb il, and Υ il = mín{υ i1,..., Υ il } denotes the difference Υ il = f(x i + B il, W il ) f(x i, W i ), (1) l = 1,..., H here, W il and W i are the realizations observed from the noise at points x i and x i +B il respectively. Bil denotes the vector in which the minimum increment is produced, and H is the number of points generated on the surface of the n-dimensional unit hipersphere. Note that B il are independent and uniformly distributed vectors on the surface of such sphere. We assume that f(x, W ) = f(x) + W, where W N(, σ 2 ). From the convexity of f we have f(x i + x i ) f(x i ) x i, f(x i ) or equivalently f(x i+1 ) = f(x i + x i ) = f(x i ) + x i, f(x i ) + δ( x i ) = f(x i ) + x i f(x i ) cos θ + δ( x i ), where cos θ is the angle between the unit vectors x i and f(x i ), and δ( x i ) as x i. We analyze two cases. In the first, we consider the noise W =, and in the second, W N(, σ 2 ). First case: W =. From (9), note that x i = x i+1 x i = αγ i Υ ilb il, (11) Substituting (1) in (9) we obtain f(x i+1 ) = f(x i ) + α i γ i Υ il Bil f(x i ) cos θ + δ( x i ) Taking into account that B ig = 1, and for x i sufficiently small, then therefore f(x i+1 ) = f(x i ) + α i γ i Υ il cos θ, Thus, by (7) f i = α i γ i Υ il cos θ. C i = E [f(x i) f(x i+1 )] = E [ f i] H H = 1 α H iγ i Υ il E [cos θ] (12) where the probabilty density function of the random angle is given by (see [7]) ζ n (θ) = sin n 2 (θ) π sinn 2 (θ)dθ = n sin n 2 (θ), π/2 θ π/2, (13) where = Γ(n/2) π Γ[(n 1)/2], and Γ denotes the gamma function. If h i = α i γ i Υ il, then substituting (13) in (12) we have that E [ f i ] = h i 2h i π/2 cos(θ) ζ i (θ)dθ = cos θ sin n 2 (θ)dθ = 2h i n 1, therefore (7) takes the form E( f i) E(N i ) = 2h i H(n 1), (14) 4

5 Second case: W N(, σ 2 ). Consider the event { xi h x i+1 = i Bil, if Υ = min{υ il} x i, in other case (15) note that for any i iteration, Υ il is such that Υ il = ψ(x i + B il, W il ) ψ(x i, W i ) = f(x i + B il ) + W il f(x i ) W i = f(x i + B il ) f(x i ) (16) where W il and W i are the realizations of the noise observed at points x i + B il and x i. The probability of this event is (see [2]) P = 1 2 ( 1 + φ ( f )). where φ(y) = y 2π 1/2 e t2 dt. Thus P = 1 ( ( )) hi cos θ 1 + φ. 2 As in [1], let Q be the random variable defined by Q = { cos θ, with probability (P) cos θ with probability (1 P) for π/2 θ π/2. Then, taking the mathematical expectation in Q we obtain E [Q] = π/2 n P [cos θ sin n 2 (θ)]dθ π/2 n (1 P) [cos θ sin n 2 (θ)]dθ = 2 P [cos θ sin n 2 (θ)]dθ 2 (1 P) [cos θ sin n 2 (θ)]dθ. After some algebraic manipulations, wehave E [Q] π/2 = cos θ sin n 2 (θ) φ 2 n ( fi ) dθ = ( ) cos θ sin n 2 hi cos θ (θ) φ dθ. (17) Then, C i takes the form 2 π/2 cos θ sin n 2 (θ)φ H ( hi cos θ ) dθ. (18) In the final analysis we let us estimate the variance of f i from E[Q 2 ] [E[Q]] 2. Note that 2 E[Q 2 ] = 2 2 P cos 2 θ sen n 2 (θ)dθ (1 P) cos 2 θ sen n 2 (θ)dθ = ( hi cos θ φ Since, for x small, ) cos 2 θ sen n 2 (θ)dθ φ(x) (x x3 +...), (19) 2π 2 3 then E[Q] can be written as [ 1 n 1 + h i σ 2π cos 2 θ sen n 2 (θ)dθ and E[Q 2 ] is defined by [ sen n+1 (θ)dθ + n h i σ 2π cos 3 θ sen n 1 (θ)dθ ], ] 5

6 thus (7) takes the form h i n Hσ 2π C n = n H(n 1) + cos 2 θ sen n 2 (θ)dθ. (2) Finally, and using (19), (7) can be defined as D n = H(n 1)Var[Q] + h i cos 2 θ sen n 2 (θ)dθ. (21) H(n 1)Var[Q] Where (2) and (21) can be written in the linear form C n = a n + b n h i, D n = a n + b nh i, (22) where a n = b n = b n = h i n Hσ 2π H(n 1), and a n = H(n 1)Var[Q], cos 2 θ sen n 2 (θ)dθ, and R π/2 cos 2 θ sen n 2 (θ)dθ H(n 1)Var[Q]. Table 1 shows some values of C n. Here, ϑ n = cos 2 θ sin n 2 (θ)dθ. n n ϑ n C n h i h i h i E-3 + (1E-3) h i E-2 + (3.6E-4)h i Table 1: Efficiency of C n of the search can be viewed as a linear function of the h i parameter, and of the evaluated points on the surface of the unit sphere. Where h i = α i γ i Υ im. 5. References 1. Rubinstein, Y. and Samorodnitsky, G., Efficieny of the random search method, Mathematics and computers in simulation, XXIV, pp , Pérez, L. G., An algorithm for the stochastic optimization of some dynamic models, Engineering Doctor Dissertation, DEPFI, UNAM, Yu. M. Ermoliev., Stochastic quasigradient methods and their application to system optimization, Stochastics, Vol. 9, pp. 1-36, Wardi, Y., Random search algorithms with sufficient descent for minimization of functions, Mathematics of Operations Research, Vol 14, No. 2, pp , Anderson, R. L. Recent advances in finding best operating conditions, J. Amer. Statist, Assoc., 48, pp , Rastrigin, L. A. Extremal control by methods of random scanning. Automat. Remote Control, 21, pp , Reuben, Y.R., Generating random vectors uniformly distributed inside and on the surface of different regions, Euro., J.O.R., Vol. 1, pp.25-29, Conclusions For the algorithm presented, the efficiency 6

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Rule 110 explained as a block substitution system

Rule 110 explained as a block substitution system Rule 110 explained as a block substitution system Juan C. Seck-Tuoh-Mora 1, Norberto Hernández-Romero 1 Genaro J. Martínez 2 April 2009 1 Centro de Investigación Avanzada en Ingeniería Industrial, Universidad

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Presentation in Convex Optimization

Presentation in Convex Optimization Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS

THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS Submitted: 24 September 2007 Revised: 5 June 2008 THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS by Angelia Nedić 2 and Dimitri P. Bertseas 3 Abstract In this paper, we study the influence

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Lecture: Smoothing.

Lecture: Smoothing. Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate

More information

Optimal Transport: A Crash Course

Optimal Transport: A Crash Course Optimal Transport: A Crash Course Soheil Kolouri and Gustavo K. Rohde HRL Laboratories, University of Virginia Introduction What is Optimal Transport? The optimal transport problem seeks the most efficient

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality

More information

Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of Certain Quasi-Nonexpansive Mappings

Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of Certain Quasi-Nonexpansive Mappings Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of Certain Quasi-Nonexpansive Mappings Isao Yamada Tokyo Institute of Technology VIC2004 @ Wellington, Feb. 13, 2004

More information

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Relaxed linearized algorithms for faster X-ray CT image reconstruction Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction

More information

Non-convex optimization. Issam Laradji

Non-convex optimization. Issam Laradji Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Massachusetts Institute of Technology Department of Physics. Final Examination December 17, 2004

Massachusetts Institute of Technology Department of Physics. Final Examination December 17, 2004 Massachusetts Institute of Technology Department of Physics Course: 8.09 Classical Mechanics Term: Fall 004 Final Examination December 17, 004 Instructions Do not start until you are told to do so. Solve

More information

Evaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015

More information

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018 Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing

More information

SOCP Relaxation of Sensor Network Localization

SOCP Relaxation of Sensor Network Localization SOCP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle IWCSN, Simon Fraser University August 19, 2006 Abstract This is a talk given at Int. Workshop on

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation Peter J.C. Dickinson DMMP, University of Twente p.j.c.dickinson@utwente.nl http://dickinson.website/teaching/2017co.html version:

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

Limiting the Velocity in the Particle Swarm Optimization Algorithm

Limiting the Velocity in the Particle Swarm Optimization Algorithm Limiting the Velocity in the Particle Swarm Optimization Algorithm Julio Barrera 1, Osiris Álvarez-Bajo 2, Juan J. Flores 3, Carlos A. Coello Coello 4 1 Universidad Michoacana de San Nicolás de Hidalgo,

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1 A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Gradient-based Adaptive Stochastic Search

Gradient-based Adaptive Stochastic Search 1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction

More information

EE364b Homework 2. λ i f i ( x) = 0, i=1

EE364b Homework 2. λ i f i ( x) = 0, i=1 EE364b Prof. S. Boyd EE364b Homework 2 1. Subgradient optimality conditions for nondifferentiable inequality constrained optimization. Consider the problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m,

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

A quasisecant method for minimizing nonsmooth functions

A quasisecant method for minimizing nonsmooth functions A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,

More information

SOCP Relaxation of Sensor Network Localization

SOCP Relaxation of Sensor Network Localization SOCP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle University of Vienna/Wien June 19, 2006 Abstract This is a talk given at Univ. Vienna, 2006. SOCP

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Hochdimensionale Integration

Hochdimensionale Integration Oliver Ernst Institut für Numerische Mathematik und Optimierung Hochdimensionale Integration 14-tägige Vorlesung im Wintersemester 2010/11 im Rahmen des Moduls Ausgewählte Kapitel der Numerik Contents

More information

GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim

GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim Korean J. Math. 25 (2017), No. 4, pp. 469 481 https://doi.org/10.11568/kjm.2017.25.4.469 GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS Jong Kyu Kim, Salahuddin, and Won Hee Lim Abstract. In this

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Double Smoothing technique for Convex Optimization Problems with Linear Constraints

Double Smoothing technique for Convex Optimization Problems with Linear Constraints 1 Double Smoothing technique for Convex Optimization Problems with Linear Constraints O. Devolder (F.R.S.-FNRS Research Fellow), F. Glineur and Y. Nesterov Center for Operations Research and Econometrics

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Instructions: No books. No notes. Non-graphing calculators only. You are encouraged, although not required, to show your work.

Instructions: No books. No notes. Non-graphing calculators only. You are encouraged, although not required, to show your work. Exam 3 Math 850-007 Fall 04 Odenthal Name: Instructions: No books. No notes. Non-graphing calculators only. You are encouraged, although not required, to show your work.. Evaluate the iterated integral

More information

Lecture 4: Linear and quadratic problems

Lecture 4: Linear and quadratic problems Lecture 4: Linear and quadratic problems linear programming examples and applications linear fractional programming quadratic optimization problems (quadratically constrained) quadratic programming second-order

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Generalized Pattern Search Algorithms : unconstrained and constrained cases

Generalized Pattern Search Algorithms : unconstrained and constrained cases IMA workshop Optimization in simulation based models Generalized Pattern Search Algorithms : unconstrained and constrained cases Mark A. Abramson Air Force Institute of Technology Charles Audet École Polytechnique

More information

Euler Equations: local existence

Euler Equations: local existence Euler Equations: local existence Mat 529, Lesson 2. 1 Active scalars formulation We start with a lemma. Lemma 1. Assume that w is a magnetization variable, i.e. t w + u w + ( u) w = 0. If u = Pw then u

More information

Non-Linear Optimization

Non-Linear Optimization Non-Linear Optimization Distinguishing Features Common Examples EOQ Balancing Risks Minimizing Risk 15.057 Spring 03 Vande Vate 1 Hierarchy of Models Network Flows Linear Programs Mixed Integer Linear

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205

More information

arxiv: v2 [math.oc] 5 May 2018

arxiv: v2 [math.oc] 5 May 2018 The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems Viva Patel a a Department of Statistics, University of Chicago, Illinois, USA arxiv:1709.04718v2 [math.oc]

More information

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search ANZIAM J. 55 (E) pp.e79 E89, 2014 E79 On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search Lijun Li 1 Weijun Zhou 2 (Received 21 May 2013; revised

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

Convex relaxations of chance constrained optimization problems

Convex relaxations of chance constrained optimization problems Convex relaxations of chance constrained optimization problems Shabbir Ahmed School of Industrial & Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive, Atlanta, GA 30332. May 12, 2011

More information

5 Linear Algebra and Inverse Problem

5 Linear Algebra and Inverse Problem 5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

On an uniqueness theorem for characteristic functions

On an uniqueness theorem for characteristic functions ISSN 392-53 Nonlinear Analysis: Modelling and Control, 207, Vol. 22, No. 3, 42 420 https://doi.org/0.5388/na.207.3.9 On an uniqueness theorem for characteristic functions Saulius Norvidas Institute of

More information

Reformulation of chance constrained problems using penalty functions

Reformulation of chance constrained problems using penalty functions Reformulation of chance constrained problems using penalty functions Martin Branda Charles University in Prague Faculty of Mathematics and Physics EURO XXIV July 11-14, 2010, Lisbon Martin Branda (MFF

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Lecture 9 Sequential unconstrained minimization

Lecture 9 Sequential unconstrained minimization S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

A Second-Order Method for Strongly Convex l 1 -Regularization Problems

A Second-Order Method for Strongly Convex l 1 -Regularization Problems Noname manuscript No. (will be inserted by the editor) A Second-Order Method for Strongly Convex l 1 -Regularization Problems Kimon Fountoulakis and Jacek Gondzio Technical Report ERGO-13-11 June, 13 Abstract

More information

On SDP and ESDP Relaxation of Sensor Network Localization

On SDP and ESDP Relaxation of Sensor Network Localization On SDP and ESDP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle IASI, Roma September 16, 2008 (joint work with Ting Kei Pong) ON SDP/ESDP RELAXATION OF

More information

Lecture 6 : Projected Gradient Descent

Lecture 6 : Projected Gradient Descent Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly

More information

On Nesterov s Random Coordinate Descent Algorithms

On Nesterov s Random Coordinate Descent Algorithms On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Superiorized Inversion of the Radon Transform

Superiorized Inversion of the Radon Transform Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of New York March 28, 2017 The Radon Transform in 2D For a function f of two real variables, a real number

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

LINEAR AND NONLINEAR PROGRAMMING

LINEAR AND NONLINEAR PROGRAMMING LINEAR AND NONLINEAR PROGRAMMING Stephen G. Nash and Ariela Sofer George Mason University The McGraw-Hill Companies, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information