Lecture notes. Mathematical Methods for Applications in Science and Engineering part II: Numerical Optimization

Size: px
Start display at page:

Download "Lecture notes. Mathematical Methods for Applications in Science and Engineering part II: Numerical Optimization"

Transcription

1 Lecture notes Mathematical Methods for Applications in Science and Engineering part II: Numerical Optimization Eran Treister Computer Science Department, Ben-Gurion University of the Negev.

2 2

3 Contents 1 Preliminary background unconstrained optimization Optimality conditions for unconstrained optimization Convexity Iterative methods for unconstrained optimization Steepest descent Newton, quasi-newton Line-search methods Coordinate descent methods Non-linear Conjugate Gradient methods Constrained Optimization Equality-constrained optimization Lagrange multipliers The KKT optimality conditions for general constrained optimization Penalty and Barrier methods The projected Steepest Descent method Robust statistics in least squares problems 45 5 Stochastic optimization Stochastic Gradient descent Example: SGD for a minimizing a convex function Upgrades to SGD Minimization of Neural Networks for Classification Linear classifiers: Logistic regression Computing the gradient of logistic regression Adding the bias Linear classifiers: Multinomial logistic (softmax) regression Computing the gradient of softmax regression

4 2 CONTENTS 6.3 The general structure of a neural network Computing the gradient of NN s: back-propagation Gradient and Jacobian verification

5 1 1. Preliminary background unconstrained optimization 1 Preliminary background unconstrained optimization An optimization problem is a problem where a cost function f(x) : R n R needs to be minimized or maximized. We will mostly refer to minimization, as maximization can always be achieved by minimizing f. The function f() is called the objective. In most cases, we wish to find a point x that minimizes f. We will denote this by x = arg min f(x). (1) x R n This optimization problem is not the most general problem we will deal with. This problem is called an unconstrained optimization problem, and we will deal with this problem first. We will deal with constrained optimization in the next section. Optimization problems arise in huge variety of applications in fields like natural sciences, engineering, economics, statistics and more. Optimization arises from our nature - we usually wish to make things better in some sense. In recent years, large amounts of data have been collected, and consequently optimization methods arise from the need to characterize this data by computational models. In this course we will deal with continuous optimization problems, as opposed to discrete optimization which is a completely different field with different challenges. Until now we met some optimization problems, such as least squares and A norm minimization problems. Those problems had quadratic objectives, and setting their gradient to zero resulted in a linear system. Understanding quadratic optimization problems are the key for understanding general optimization problems, and often, the iterative methods used for general optimization problems are just a generalization of methods for solving linear systems. We ve seen a bit of this concept in Steepest Descent and Gauss-Seidel. Another strong connection between quadratic and general optimization problems is that every objective function f is locally quadratic. We will now see that via the Taylor expansion.

6 2 1. Preliminary background unconstrained optimization Multivariate Taylor expansion one-dimensional Taylor expansion is given by Assume a continuous one-variable function f(x). The f(x + ε) = f(x) + f (x)ε f (x)ε ! f (c)ε 3 ; c [x, x + ε]. Assuming that f is continuous, we will usually refer to the last term just as O(ε 3 ), and all of our derivations will focus on the first two or three terms. If f has two variables, f = f(x 1, x 2 ), then the two-dimensional Taylor expansion is given by f(x 1 + ε 1, x 2 + ε 1 ) = f(x 1, x 2 ) + f x 1 ε 1 + f x 2 ε f ε 2 x f x 1 x 2 ε 1 ε Let us recall the definition of the gradient, which in two-variables is given by f = [ f x 1 f x 2 This is a 2 1 vector, and if we denote by ε = [ε 1, ε 2 ], then ] f, ε = f x 1 ε 1 + f x 2 ε f ε 2 x Now we will define the two-dimensional Hessian matrix a two-dimensional second derivative: 2 f = H = [ 2 f x f x 1 x 2 2 f x 1 x 2 2 f x 2 2 ]. It can be seen that 1 ε, Hε = expansion can be written as 2 f x 2 1 ε f 2 f x 1 x 2 ε 1 ε ε 2 2 x 2, and therefore the Taylor 2 2 f(x + ε) = f(x) + f, ε ε, Hε + O( ε 3 ), This expansion is also suitable for n dimensional functions, where the Hessian matrix H R n n is defined by H ij = 2 f x i x j.

7 3 1. Preliminary background unconstrained optimization Note that if we assume that f is smooth and twice differentiable, then H is a symmetric matrix (but not necessarily positive definite). A Taylor expansion of a function vector In some cases, derivatives may be complicated and we need to know to calculate a derivative of a vector of functions, as opposed to scalar functions that we dealt with so far. Consider f(x) : R n R m. That is x R n, and f(x) R m. The first order approximation of each of the functions f i (x) is given by f i (x + ε) f i (x) + f i (x), ε. For the whole function vector f(x), we can write δf = f(x + ε) f(x) Jε,

8 4 1. Preliminary background unconstrained optimization where J R m n is the Jacobian matrix, comprised of the gradients f i (x) as rows: f 1 (x) f J = 2 (x). J i,j = f i. x j f m (x) For the sake of gradient calculation we will focus on the case where ε 0 and assume equality in the definition of δf. Example 1. (The Jacobian of Ax) Suppose that f = Ax, where A R m n. It is clear that δf = f(x + ε) f(x) = A(x + ε) Ax = Aε. It is clear that J = A. Example 2. (The Jacobian of φ(x), where φ is a scalar function) Suppose that f = φ(x), where φ : R R is a scalar function, i.e., f i (x) = φ(x i ). In this case the vector Taylor expansion is just a vector of one dimensional expansions. δf = φ(x + ε) φ i (x) diag(φ (x))ε = diag(φ (x))δx. This means that J = diag(φ (x)), which is a diagonal matrix such that J ii = φ (x i ). 1.1 Optimality conditions for unconstrained optimization We we ask ourselves: what is a minimum point of f? We will consider two kinds of minimum points. Definition 1 (Local minimum). A point x will be called a local minimum of a function f() if there exists r > 0 s.t f(x) f(x ) for all x s.t x x < r. If we use a strict inequality, then we have a strict local minimum.

9 5 1. Preliminary background unconstrained optimization Figure 1: Global and local minimum Definition 2 (Global minimum). A point x will be called a global minimum of a function f() if f(x) f(x ) for all x R n. Again, if we have a strict inequality, we will also have a strict global minimum. We will ask ourselves now, how can we identify or compute a local and global minima of a function? It turns out that a local minima is possible to define, and a global one is harder. In this course will focus on iterative methods to find a local minimum, hoping that we are not so far from the global minimum. There are many ways to handle issue of global vs. local minimum in the case of general functions e.g., adding regularization, and investigating a good starting point but we will not deal with that in this course.

10 6 1. Preliminary background unconstrained optimization It turns out that there is a quite large family of functions, where any local minimum is also a global minimum. These functions are called convex functions. 1.2 Convexity The convex set definition is illustrated in Fig. in The convex function definition can be illustrated in 1D see Figure 3. A 2D example of convex and non convex function is given in Fig. 4.

11 7 1. Preliminary background unconstrained optimization Figure 2: Convexity of sets Figure 3: Convexity: the image uses t instead of α

12 8 1. Preliminary background unconstrained optimization Figure 4: Convexity in 2D: the left image shows a non-convex function, while the right one shows a convex (and quadratic) function. Alternative definitions for convexity 1. Assume that f(x) is differentiable. f(x) is a convex function over a convex region Ω if and only if for all x 1, x 2 Ω f(x 1 ) f(x 2 ) + f(x 2 ), x 1 x 2. This definition means that any tangent plane of the function has to be under the function. This definition is equivalent to the previous traditional definition. 2. Assume that f(x) is twice differentiable. f(x) is a convex function over a convex region Ω if and only if the Hessian 2 f(x) 0 is positive semi-definite for all x. It is easy to show that the two definitions are equivalent through a Taylor series: f(x 1 ) = f(x 2 ) + f(x 2 ), x 1 x x 1 x 2, 2 f(c)(x 1 x 2 ), c [x 1, x 2 ].

13 9 1. Preliminary background unconstrained optimization Example 3 (The definitions of convexity for a quadratic function). A

14 10 1. Preliminary background unconstrained optimization

15 11 1. Preliminary background unconstrained optimization When the objective function is convex, local and global minimizers are simple to characterize. Theorem 1. When f is convex, any local minimizer x is a global minimizer of f. If in addition f is differentiable, then any stationary point x such that f(x ) = 0 is a global minimizer of f. We will only consider the proof of the first part of the Theorem here: The proof of the second part of the Theorem is obtained by contradiction. These results provide the foundation of unconstrained optimization algorithms. If we have a convex function, then every local minimizer is also a global minimizer, and hence if we have methods that reach local minimizers, we can reach the global minimizer. This is why convex problems are very popular in science, although in some cases there is no choice but to solve non-convex problems. In all algorithms we seek a point x where f(x ) = 0.

16 12 2. Iterative methods for unconstrained optimization 2 Iterative methods for unconstrained optimization Just like linear systems, optimization can be carried out by iterative methods. Actually, in optimization, it is usually not possible to directly find a minimum of a function, and so iterative methods are much more essential in optimization than in numerical linear algebra. On the other hand, we have learned about minimizing quadratic functions, which is equivalent to solving linear systems. Since every function is locally approximated by a quadratic function (and specifically near its minimum) everything we learned about iterative methods for linear systems can be useful in the context of optimization. 2.1 Steepest descent Earlier, we met the steepest descent method x (k+1) = x (k) α (k) f(x (k) ), which we analysed for positive definite linear systems Ax = b. In fact, we saw that if we look at the equivalent quadratic minimization, then the matrix A has to be positive definite, which means that the quadratic function f has to be convex (this does not mean that a general f has to be convex for Steepest Descent to work). Just like in the numerical linear algebra case, α (k) can be chosen sufficiently small (whether a constant α (k) = α or not) so that the method converges to a local minimum. However, a more common way is to look at α (k) as a step size, and choose it in some wise way.

17 13 2. Iterative methods for unconstrained optimization

18 14 2. Iterative methods for unconstrained optimization Corollary 1. Any direction d = M f such that M 0 is a descent direction. We will look at more general methods next, but first observe that steepest descent method (with a pre-defined step size α) can be viewed as a step that minimizes the quadratic function that approximates f(x) around x (k) : { d (k) SD = arg min f(x) + f(x (k) ), d + 1 } d, d = α f(x (k) ). d 2α Then: x k+1 = x (k) + d (k) SD = x(k) α f(x (k) ). This derivation assumes that α is either constant or known a-priori. We will now see some other methods that follow the same pattern, and later see a method for determining the step length α through linesearch. 2.2 Newton, quasi-newton One of the most important methods outside of steepest descent is Newton s method, which is much more powerful than SD, but also may be more expensive. Recall again the Taylor expansion f(x (k) + ε) = f(x (k) ) + f(x (k) ), ε ε, 2 f(x (k) )ε + O( ε 3 ). (2) Now we choose the next step x (k+1) = x (k) + ε by a minimizing Eq. (2) without the O( ε 3 ) term with respect to ε. That is, the Newton direction d N is chosen by the following quadratic minimization: d (k) N = arg min d { f(x) + f(x (k) ), d + 1 } 2 d, 2 f(x (k) )d = ( f (x (k) ) 1 ) f(x (k) ). This minimization is similar to the one mentioned in SD, but now it has a 2 f(x (k) ) instead of a matrix 1 α I. This shows us that in SD we essentially approximate the real Hessian with a scaled identity matrix.

19 15 2. Iterative methods for unconstrained optimization The Newton procedure leads to a search direction d N = ( 2 f(x (k) )) 1 f(x (k) ), which is the Newton s direction. This is the best search direction that is practically used, and it is relatively hard to obtain: first, the function f should be twice differentiable. Second, one has to compute the Hessian matrix or at least know how to solve a non-trivial linear system involving the Hessian matrix. This has to be computed at each iteration, which may be costly. Solving the linear system can be done directly through an LU decomposition, or iteratively using an appropriate method. If we choose a constant steplength, then one can show that the best step-length for Newton s method is asymptotically 1.0. However, throughout the iterations it is better to perform a lineaserach over the direction d N to guarantee the decreasing values of f(x (k) ) and convergence. It is important to note that the Newton s method should be used with care. There is no guarantee that d is a descent direction, or that the matrix ( 2 f(x (k) )) is invertible. If the problem is convex and the Hessian is positive definite - we have nothing to worry about. Quasi Newton methods Applying the Newton s method is expensive. In Quasi-Newton we approximate the Hessian by some matrix M(x (k) ), and at each step minimize d (k) QN { = arg min f(x) + f(x (k) ), d + 1 } d 2 d, M(x(k) )d = (M(x (k) ) 1 ) f(x (k) ). (3) M can be chosen as diagonal, M = ci or M = diag( 2 f), which is easily invertible, or by other ways. We will not go into this further in this course, but the limited memory Broyden- Fletcher-Goldfarb-Shanno (LBFGS) method is one of the most popular quasi-newton methods in the literature. It iteratively builds a low rank approximation of the Hessian from previous search directions, which is somewhat similar to some Krylov methods that we ve seen earlier in this course (e.g. GMRES). All the methods above are one-point iterations, and are summarized in Algorithm 1.

20 16 2. Iterative methods for unconstrained optimization Algorithm: General one-point iterative method # Input: Objective: f(x) to minimize. for k = 1,..., maxiter do Compute the gradient: f(x k ). Define the search direction d (k), by essentially solving a quadratic minimization. Choose a step-length α (k) (possibly by linesearch) Apply a step: x (k+1) = x (k+1) + α (k) d (k). < ɛ or alternatively < ɛ. then f(x (1) ) x (k) Convergence is reached, stop the iterations. end end Return x (k+1) as the solution. Algorithm 1: General one-point iterative method for unconstrained optimization if f(x(k+1) ) x (k+1) x (k) 2.3 Line-search methods Line-search methods are among the most common methods in optimization. Assume we are at iteration k, at point x (k). Assume that we have a descent direction d (k) in which the function decreases. Line-search methods performs x (k+1) = x (k) + α (k) d (k) (4) and choose α (k) such that f(x (k) + αd (k) ) is exactly or approximately minimized for α. In the quadratic SD case, we had a closed term that minimizes this linesearch. When computing the step length α (k), we are essentially minimizing a one-dimensional function φ(α) = f(x (k) + αd (k) ), α (k) = arg min φ(α) α with respect to α. All 1-D minimization methods can be used here, but we face a tradeoff. We would like to choose α (k) to substantially reduce f, but at the same time we do not want to spend too much time making the choice. In general, it is too expensive to identify a minimizer, as it may require too many evaluations of the objective f. We assume here that the computation of f(x (k) + αd (k) ) is just as expensive as computing f(w) for some arbitrary vector w. If this is not the case, we may be able to allow ourselves to invest more

21 17 2. Iterative methods for unconstrained optimization iterations in minimizing φ(α). However, note that minimizing φ is only locally optimal (a greedy choice), and there is no guarantee that an exact linesearch minimization is indeed the right choice for the fastest convergence, not to mention the cost of doing that. More practical strategies perform an inexact line search to identify a step length that achieves adequate reductions in f at minimal cost. Typical line search algorithms try out a sequence of candidate values for α, stopping to accept one of these values when certain conditions are satisfied. Sophisticated line search algorithms can be quite complicated. We now show a practical termination condition for the line search algorithm. We will later show, using a simple quadratic example, that effective step lengths need not lie near the minimizers of φ(α). Backtracking line search using the Armijo condition We will now show one of the most common line search algorithms, called backtracking line search, also known as the Armijo rule. It involves starting with a relatively large estimate of the step size α, and iteratively shrinking the step size (i.e., backtracking ) until a sufficient decrease of the objective function is observed. The motivation is to choose α (j) to be as large as possible while having a sufficient decrease of the objective at the same time. We are not minimizing φ. This is done by choosing α 0 rather large, and choosing a decrease factor 0 < β < 1. We will examine the values of φ(α j ) for decreasing values α j = β j α 0 (using j = 0, 1, 2,...) until the following condition is satisfied: f(x (k) + α j d (k) ) f(x (k) ) + cα j f, d (k). (5) 0 < c < 1 is usually chosen small (about 10 4 ), and β is usually chosen to be about 0.5. We are guaranteed that such a point exists, because we assume that d (k) is a descent direction. That is, we know that f, d (k) < 0, and using the Taylor theorem we have f(x (k) + αd (k) ) = f(x (k) ) + α f, d (k) + O(α 2 d (k) 2 ). It is clear that for some α small enough we will have f(x (k) ) f(x (k) + αd (k) ) = α f, d (k) + O(α 2 d (k) 2 ) > 0.

22 18 2. Iterative methods for unconstrained optimization Figure 5: Armijo rule

23 19 2. Iterative methods for unconstrained optimization In our stopping rule we usually set c to be small, so we choose α to be relatively large. Algorithm 2 summarizes the backtracking linesearch procedure. Common upgrade for this procedure are Choose α 0 for iteration k as 1 β α(k 1) or 1 β 2 α (k 1). Instead of only reducing the α j s, also try to enlarge them by α j+1 = 1 β α j as long as the Armijo condition is satisfied. Algorithm: Armijo Linesearch # Input: Iterate x, objective f(x), gradient f(x), descent direction d. # Constants: α 0 > 0, 0 < β < 1, 0 < c < 1. for j = 0,..., maxiter do Compute the objective φ(α j ) = f(x + α j d). if f(x + α j d) f(x (k) ) + cα j f, d (k). then Return α = α j as the chosen step-length. else α j+1 = βα j end end # If maxiter iterations were reached, α is too small and we are probably stuck. # In this case terminate the minimization. Algorithm 2: Armijo backtracking linesearch procedure. Example 4 (Newton vs. Steepest descent). We will now consider the minimization of the Rosenbrock banana function f(x) = (a x 1 ) 2 + b(x 2 x 2 1) 2, where a, b are parameters that determine the difficulty of the problem. Here we choose b = 5, and a = 1. The minimum of this function is at [a, a 2 ], where f = 0 (in our case it s the point [1,1]). The gradient and Hessian of this function are [ ] [ ] 4bx 1 (x 2 x 2 f(x) = 1) 2(a x 1 ) 2 4b(x 2 x 2 f(x) = 1) + 8bx bx 1 2b(x 2 x 2 1) 4bx 2 2b In the code below we apply the SD and Newton s algorithms starting from the point [ 1.4, 2]. After 100 iterations, SD reached [ , ] on his way to [1, 1]. The Newton method solved the problem in only 9 iterations.

24 20 2. Iterative methods for unconstrained optimization Figure 6: The convergence path of steepest descent and Newton s method for minimizing the banana function. using PyPlot; close("all"); xx = -1.5:0.01:1.6 yy= -0.5:0.01:2.5; X = repmat(xx,1,length(yy)) ; Y = repmat(yy,1,length(xx)); a = 1; b = 5; F = (a-x).^2 + b*(y-x.^2).^2 figure(); contour(x,y,f,500); #hold on; axis image; xlabel("x"); ylabel("y"); title("sd vs Newton for minimizing the Banana function.") f = (x)->((a-x[1]).^2 + b*(x[2]-x[1].^2).^2) g = (x)->[-4*b*x[1]*(x[2]-x[1]^2)-2*(a-x[1]); 2*b*(x[2]-x[1]^2)]; H = (x)->[-4b*(x[2]-x[1]^2)+8*b*x[1]^2+2-4*b*x[1] ; -4*b*x[1] 2*b];

25 21 2. Iterative methods for unconstrained optimization ## Armijo parameters: alpha0 = 1.0; beta = 0.5; c = 1e-4; function linesearch(f::function,x,d,gk,alpha0,beta,c) alphaj = alpha0; for jj = 1:10 x_temp = x + alphaj*d; if f(x_temp) <= f(x) + alphaj*c*dot(d,gk) break; else alphaj = alphaj*beta; end end return alphaj; end ## Starting point x_sd = [-1.4;2.0]; println("*********** SD *****************") ## SD Iterations for k=1:100 gk = g(x_sd); d_sd = -gk; x_prev = copy(x_sd); alpha_sd = linesearch(f,x_sd,d_sd,gk,0.25,beta,c); x_sd=x_sd+alpha_sd*d_sd; plot([x_sd[1];x_prev[1]],[x_sd[2];x_prev[2]],"g",linewidth=2.0); println(x_sd); end; ## Newton Iterations x_n = [-1.4;2.0]; println("*********** Newton *****************") for k=1:10 gk = g(x_n); Hk = H(x_N); d_n = -(Hk)\gk; x_prev = copy(x_n); alpha_n = linesearch(f,x_n,d_n,gk,alpha0,beta,c); x_n=x_n+alpha_n*d_n; plot([x_n[1];x_prev[1]],[x_n[2];x_prev[2]],"k",linewidth=2.0); println(x_n); end; legend(("sd","newton"))

26 22 2. Iterative methods for unconstrained optimization 2.4 Coordinate descent methods Coordinate descent methods are similar in principle to the Gauss-Seidel method for linear systems. When we minimize f(x), we iterate over all entries i of x, and change each scalar entry x i so that the i-th f(x) is minimized with respect to x i given that the rest of the variables are fixed. Algorithm: Coordinate Descent # Input: Objective: f(x) to minimize. for k = 1,..., maxiter do #sweep through all scalar variables x i and (approximately) minimize f(x), # according x i in turn, assuming the rest of the variables are fixed. for i = 1,..., n do x (k+1) i arg min xi f(x (k+1) 1, x (k+1) 2,..., x (k+1) i 1 end # Check if convergence is reached. end Return x (k+1) as the solution. Algorithm 3: Coordinate descent, x i, x (k) i+1,..., x(k) n ) By doing this, we essentially zero the i-th entry of the gradient, i.e., update x i such that f x i = 0. Similarly to the variational property of Gauss-Seidel, the value of f is monotonically nonincreasing with each update. If f is bounded from below, the series {f(x (k) )} converges. This does not guarantee convergence to a minimizer, but it is a good property to have. There are many rules in which order to choose the variables x i. For example, lexicographic or random orders are common. One of the requirements to guarantee convergence is that all the variables will be visited periodically. That is, we cannot guarantee convergence if a few of the variables are not visited. Coordinate descent methods are most common in non-smooth optimization, and especially effective when one-dimensional minimization is easy to apply. Example 5 (Coordinate descent for the banana function). Assume again that we are minimizing f(x) = (a x 1 ) 2 + b(x 2 x 2 1) 2, where a, b are parameters b = 10, and a = 1.

27 23 2. Iterative methods for unconstrained optimization We ve seen that f(x) = [ 4bx 1 (x 2 x 2 1) 2(a x 1 ) 2b(x 2 x 2 1) and in coordinate descent we re solving each of these equations in turn. We start with the second variable, because the updates regarding the second variable are easy to define (assuming b 0) For the first variable, we have f x 2 = 2b(x 2 x 2 1) = 0 x 2 = x 2 1. f x 1 = 4bx 1 (x 2 x 2 1) 2(a x 1 ) = 0 x 1 =?[No closed form solution] This time there is no closed form solution to the problem, and in fact there may be more than one minimum point here. A common option in such cases is to apply a one-dimensional Newton step for x 1 : x new 1 = x 1 α f x 1 = x f x 1 α 4bx 1(x 2 x 2 1) 2(a x 1 ) 1 4b(x 2 x 2 1) + 8bx where α is obtained by linesearch with respect to minimizing f(x 1 + αd 1, x 2 ) and d 1 is the Newton search direction for x 1. The code for the CD updates appears next. It works in addition to the code in the previous examples. After 100 iterations, CD iterates reached [ , ] on their way to [1,1], which is significantly better than SD but worse than Newton s method. ],

28 24 2. Iterative methods for unconstrained optimization Figure 7: The convergence path of the coordinate descent method for minimizing the banana function. ## CD Iterations (this code comes in addition to the first section of code in the previous example) x_cd = [-1.4;2.0]; println("*********** Newton *****************") for k=1:100 # first variable Newton update: gk = -4*b*x_CD[1]*(x_CD[2]-x_CD[1]^2)-2*(a-x_CD[1]); # This is g[1] Hk = -4*b*(x_CD[2]-x_CD[1]^2)+8*b*x_CD[1]^2+2; # This is H[1,1] d_1 = -gk/hk; ## Hk is a scalar f1 = (x)->((a-x).^2 + b*(x_cd[2]-x.^2).^2) # f1 accepts a scalar. x_prev = copy(x_cd); alpha_1 = linesearch(f1,x_cd[1],d_1,gk,alpha0,beta,c); x_cd[1] = x_cd[1]+alpha_1*d_1; # second variable update x_cd[2] = x_cd[1]^2; plot([x_cd[1];x_prev[1]],[x_cd[2];x_prev[2]],"k",linewidth=2.0); println(x_cd); end; 2.5 Non-linear Conjugate Gradient methods Earlier we ve seen the Conjugate Gradient methods for linear systems. This method has a lot of nice properties and advantages. It turns out that the linear CG method can be extended to non-linear unconstrained optimization. This extension has many variants, and all of them reduce to the linear CG method when the function is quadratic. We will not study these methods in this course, but they are quite efficient and worthy of consideration.

29 25 3. Constrained Optimization 3 Constrained Optimization In this section we will see the constrained optimization theory. The theory is a bit deep and we will see only a few of the main results in this course. A general formulation of constrained optimization problems has the form min f(x) x R n subject to { c eq j (x) = 0 j = 1,..., meq, (6) c ieq l (x) 0 l = 1,..., m ieq where f, c eq j, and cieq j are all smooth functions. f(x), as before, is the objective that we wish to minimize. c eq j (x) are equality constraints, and cieq j (x) are inequality constraints. We define a feasible set to be the set of all points that satisfy the constraints: Ω = { x c eq j (x) = 0, j = 1,..., meq ; c ieq l (x) 0, l = 1,..., m ieq }. We can write (6) compactly as min x Ω f(x). The focus in this section is to characterize the solutions of (6). Recall that for unconstrained minimization we had the following sufficient optimality condition: Any point x at which f(x ) = 0 and 2 f(x ) is positive definite is a strong local minimizer of f. Our goal now is to define similar optimality conditions to constrained optimization problems. We have seen already that global solutions are difficult to find even when there are no constraints. The situation may be improved when we add constraints, since the feasible set might exclude many of the local minima, and it may be easy to pick the global minimum from those that remain. However, constraints can also make things much more difficult. As an example, consider the problem min x x R n 2 s.t. x Without the constraint, this is a convex quadratic problem with unique minimizer x = 0. When the constraint is added, any vector x with x 2 = 1 solves the problem. There are infinitely many such vectors (hence, infinitely many local minima) whenever n 2. Definitions of the different types of local solutions are simple extensions of the corresponding definitions for the unconstrained case, except that now we restrict consideration to the feasible points in the neighborhood of x. We have the following definition.

30 26 3. Constrained Optimization Figure 8: The single equality constraint example. Definition 3 (Local solution of the constrained problem). A vector x is a local solution of the problem (6) if x Ω (x satisfies the constraints) and there is a neighborhood N of x such that f(x) f(x ) for x N Ω. 3.1 Equality-constrained optimization Lagrange multipliers To introduce the basic principles behind the characterization of solutions of general constrained optimization problems, we will first consider only equality constraints. We start with a relatively simple case of constrained optimization with a single equality constraint. Our first example is a two-variable problem with a single equality constraint. Example 6 (A single equality constraint). Consider the following problem: min x x R n 1 + x 2 subject to x x = 0, (7) In the language of Equation (6) we have m eq = 1 and m ieq = 0, with c eq 1 (x) = x x We can see by inspecting Fig 8 that the feasible set for this problem is the circle of radius 2 centered at the origin just the boundary of this circle, not its interior. The solution x is obviously [ 1, 1]. From any other point on the circle, it is easy to find a way to move that stays feasible (that is, remains on the circle) while decreasing f.

31 27 3. Constrained Optimization We also see that at the optimal point x, the objective gradient f is parallel to the constraint gradient c eq 1 (x ). In other words, there exists a scalar λ 1 such that f + λ 1 c eq 1 (x ) = 0, and in this particular case where f = [1, 1], and c eq 1 = [2x 1, 2x 2 ], we have λ 1 = 1 2, for x = [ 1, 1]. We can reach the same conclusion using the Taylor expansion. Assume a feasible point x. To retain feasibility, any small movement in the direction d from x has to satisfy the constraint 0 = c eq 1 (x + d) c eq 1 (x) + c eq 1 (x), d = c eq 1 (x), d. So any movement at the direction d from the point x has to satisfy c eq 1 (x), d = 0. (8) On the other hand, we have seen earlier than any descent direction has to satisfy f(x), d < 0 (9) because f(x + d) f(x) f(x), d. If there exists a direction d, satisfying (8)-(9), then we can say that improvement in f is possible under the constraints. It follows that a necessary condition for optimality for our problem is that there is not a direction d satisfying both (8)-(9). The only way that such a direction cannot exist is if f(x) and c eq 1 (x) are parallel, otherwise d = ( I 1 c eq 1 (x) c eq 1 (x) ceq ) 1 (x) c eq 1 (x) f(x), is a descent direction, which satisfies the constraints at least at first order. It turns out that we can pack the objective and its equality constraints into one function called the Lagrangian function L(x, λ 1 ) = f(x) + λ 1 c eq 1 (x),

32 28 3. Constrained Optimization for which x L = x f(x) + λ 1 x c eq 1 (x). Hence, at the solution x there exists a scalar λ 1 such that x L(x, λ 1) = 0. Note that the condition λ1 L = 0 provides the constraint c eq 1 (x) = 0. This observation suggests that we can search for solutions of the equality-constrained problem by searching for stationary points of the Lagrangian function. The scalar quantity λ 1 is called a Lagrange multiplier for the constraint c eq 1 (x) = 0. The condition above appears to be necessary for an optimal solution of the problem, but it is clearly not sufficient. For example, the point [1, 1] also satisfies it in our example, but it is the maximum of the function under the constraint. Back to our general equality constrained problem, which we write as min f(x) subject to x R n ceq (x) = 0, (10) where c eq (x) : R n R meq is the constraints vector function, that is c eq 1 (x) c c eq (x) = eq 2 (x). = 0. c eq m eq(x) To obtain the optimality conditions we will first make an assumption on a point x regarding the constraints. Definition 4 (LICQ for equality constrained optimization). Given the point x, we say that the linear independence constraint qualification (LICQ) holds if the set of constraint gradients c eq i is linearly independent, or equivalently, if the Jacobian of the constraints vector J eq (x ) is a full-rank matrix. Note that if this condition holds, none of the constraint gradients can be zero. assumption comes to simplify our derivations and is not really necessary from a practical point of view. For example, it breaks if we write one of the constraints twice. This clearly should have any practical influence on our ability to solve problems. This A generalization of our conclusion from the previous example is as follows. In order for x to be a stationary point, there shouldn t be a direction d such that f(x), d < 0 and

33 29 3. Constrained Optimization c eq i (x), d = 0 for i = 1,..., meq. In other words, the gradient cannot have a component that is orthogonal to all the constraints gradients. This will happen only if f(x) is a linear combination of the constraints gradients m eq f(x) + λ i c eq i (x) = f(x) + (Jeq ) λ = 0, i=1 where J eq = c eq 1 (x) c eq 2 (x). c eq m eq(x). Consequently, given an equality constrained problem (10), we write its Lagrangian function as L(x, λ) = f(x) + λ c eq (x), where the vector λ R meq is the Lagrange multipliers vector. Theorem 2 (Lagrange multipliers for equality constrained minimization). Let x be a local solution of (10), in particular satisfying the equality constraints c eq (x ) = 0, and the LICQ condition. Then there exists a unique vector λ R meq which satisfies x L(x, λ ) = f(x ) + (J eq ) λ = 0. called the Lagrange multipliers vector We will not prove this theorem, but note that λ L(x, λ ) = 0 c eq (x ) = 0, which allows us to formulate a method for solving equality constrained optimization. Back to our example of a single equality constraint min x x R n 1 + x 2 subject to x x = 0. (11) The Lagrangian of this method is L(x, λ 1 ) = x 1 + x 2 + λ 1 (x x 2 2 2) and the solution of

34 30 3. Constrained Optimization this problem is given by 1 + 2x 1 λ 1 = 0 L = x 2 λ 1 = 0 x x = 0 x 1 = 1 2λ 1 x 2 = 1 2λ λ 2 1 4λ 2 1 = 2 λ 1 = ± 1 2 x 1 = 1 x 2 = 1, As discussed before, we got two stationary points for the Lagrangian. How do we decide which one of them is a local minimum? The following theorem says that the Hessian of L with respect to x has to be positive definite with respect to the directions that satisfy the constraints (or, the directions which are orthogonal to the constraints gradients.). Theorem 3 (2 nd order necessary conditions for equality constrained minimization). Let x be a minimum solution and let λ be a corresponding Lagrange multiplier vector satisfying L(x, λ ) = 0. Also assume that the LICQ condition is satisfied. Then the following must hold: y 2 xl(x, λ )y 0 y R n s.t. J eq y = 0. In our example, L(x, λ 1 ) = x 1 + x 2 + λ 1 (x x 2 2 2), and [ ] [ 2 2λ 1 0 xl(x, λ) = 2 xl(x, λ = ±1/2) = 0 2λ 1 ±1 0 0 ±1 ] It follows that the minimum is obtained at λ 1 = 1/2 where 2 xl(x, λ ) = I 0. This matrix is positive definite with respect to any vector, and in particular to those vectors that are orthogonal to the constraint gradient. In the other point (λ 1 = 1/2), we encounter the case of the negative definite Hessian, and hence we have a local maximum.

35 31 3. Constrained Optimization 3.2 The KKT optimality conditions for general constrained optimization Suppose now that we are considering the general case of constrained optimization, allowing inequality constraints as well. Recall the following problem definition: min f(x) x R n subject to { c eq j (x) = 0 j = 1,..., meq. (12) c ieq l (x) 0 l = 1,..., m ieq It turns out that for this problem we should make a distinction between active inequality constraints and inactive inequality constraints. Assume a local solution x. The active constraints are those constraints for which c ieq l (x ) = 0 and the inactive constraints are those constraints for which c ieq l (x ) < 0. Example 7. Consider the following problem which is similar to the previous one, only now we have an inequality constraint: min x x R n 1 + x 2 subject to x x (13) Similarly to the previous problem, the solution of this problem lies at the point [ 1, 1], where the constraint is active. The Lagrangian is again L(x, λ 1 ) = x 1 + x 2 + λ 1 (x x 2 2 2), and the minimum is obtained at the point satisfying x L(x, λ 1) = 0. But now, it turns out that the sign of the Lagrange multiplier λ 1 = 1 has a significant role. 2 Similarly to before, we need that any search direction d will satisfy c ieq 1 (x + d) c ieq 1 (x) + c ieq 1 (x), d 0. If the constraint c ieq 1 is inactive at the point x we can move at any direction, and in particular we can move at the descent direction f(x) and decrease the objective. However, if the constraint is active, then c ieq 1 (x) = 0, and to satisfy the constraint we have the condition c ieq 1 (x), d 0. (14) In order for d to be a legal descent direction it needs to satisfy both f(x), d < 0 and (14). If we wish that d is not a descent direction, we need the inner products not to be both

36 32 3. Constrained Optimization negative. For this we must require with λ 1 0. f + λ 1 c ieq 1 (x ) = 0, From the example above we learn that active inequality constraints act as equality constraints but require the lagrange multiplier to be positive at stationary points. If we have multiple inequality constraints, {c ieq l direction d, each of them which is active will satisfy c ieq l with λ l 0, then f + f(x), d + l active (x) 0} meq l=1, then we require that for a descent (x), d 0, but if l active λ l c ieq l (x ) = 0, λ l c ieq l (x ), d = 0, and f(x), d 0 must hold. This means that d cannot be a descent direction under these conditions. We will now state the necessary conditions for a solution of a general constrained minimization problem. Definition 5 (Active set). The active set A(x) at any feasible x is the union of inequality constrains which are satisfied with exact equality {l : c ieq l (x) = 0}. We define the Lagrangian of the problem (12) by L(x, λ eq, λ ieq ) = f(x) + (λ eq ) c eq (x) + (λ ieq ) c ieq (x), where λ eq R meq and λ ieq R mieq are the Lagrange multiplier vectors for the equality and inequality constraints respectively. The next theorem states the first order necessary conditions, known as the Karush-Kuhn-Tucker (KKT) conditions. Theorem 4 (First-Order Necessary Conditions (KKT)). Suppose that x is a local solution of (12), and that the LICQ holds at x. Then there are Lagrange multiplier vectors λ eq R meq

37 33 3. Constrained Optimization and λ ieq R mieq, such that the following hold: x L(x, λ eq, λ ieq ) = f(x ) + (J eq ) λ eq + (J ieq ) λ ieq = 0 (15) c eq (x ) = 0 (16) c ieq (x ) 0 (17) λ ieq 0 (18) (Complementary slackness) for l = 1,..., m ieq λ ieq l c ieq l (x) = 0 (19) The conditions (17)-(19) may be replaced with λ ieq l > 0 for the active constraints l A(x ) and λ ieq l = 0 for the inactive ones. To know whether a stationary point is a minimum or a maximum we have the Theorem 5 (2 nd order necessary conditions for general constrained minimization). Let x be a minimum solution satisfying the KKT conditions and let λ eq, λ ieq be the corresponding Lagrange multiplier vectors. Also assume that the LICQ condition is satisfied. Then the following must hold: y 2 xl(x, λ eq, λ ieq )y 0 y R n s.t. { J eq y = 0 c ieq l (x )) y = 0 l A(x) Example 8. Consider the following problem ( min x ( + x 2 x R 2) 1 4 s.t. 8) n x 1 + x x 1 x x 1 + x x 1 x (20) This time, we have four inequality constraints. In principle, we should try every combination of active and inactive constraints and see if the resulting x and λ that are achieved by x L = 0 satisfy the KKT conditions. It turns out that the solution here is x = [1, 0], and that the first and second constraints in are active at this point. Denoting them by c 1 and

38 34 3. Constrained Optimization c 2 we have f(x ) = [ ] c 1 (x ) = [ 1 1 ] c 2 (x ) = [ 1 1 ], and the vector λ = [ , , 0, 0]. 3.3 Penalty and Barrier methods In the previous section we saw that in order to solve a constrained optimization using the KKT conditions, we may need to solve several large linear systems, each time checking a different combination of active and inactive inequality constraints. This is a option for solving the problem but it does not align with the optimization methods we ve seen so far. In this section, we will see two closely related approaches for solving constrained optimization the penalty and barrier approaches. Both of these approach convert the constrained problem into a somewhat equivalent unconstrained problem which can be solved by SD, Newton, or any other method for unconstrained optimization. Penalty methods. problem We again assume that we have the general constrained optimization min f(x) x R n subject to { c eq j (x) = 0 j = 1,..., meq, (21) c ieq l (x) 0 l = 1,..., m ieq and now we wish to transform it to an equivalent unconstrained problem. In the penalty approach, we rewrite the problem as min f(x) + µ x R n ( m eq j=1 ρ j (c eq j mieq (x)) + l=1 ρ l (max{0, c ieq l (x)}) where µ > 0 is a balancing penalty parameter, and {ρ j (x)} and {ρ l (x)} are scalar penalty functions which are bounded from below and get their minimum at 0 (it is preferable that these are non-negative but this is not mandatory). ) (22) In addition, these functions should monotonically increase as we move away from zero. The common choice for a penalty is the quadratic function ρ(x) = x 2, which has a minimum at 0. Using this choice, if we set µ then the solution of (22) will be equal to that of (21). We will focus on this choice in this

39 35 3. Constrained Optimization course. The problem (22) is an unconstrained optimization problem which can be solved using the methods we ve learned to far. For ρ(x) = x 2, however, it turns out that the problem becomes more and more ill-conditioned as µ, which makes standard first-order approaches like SD and CD slow. Therefore, in practice we will have a continuation strategy where we solve the problem for iteratively increasing values of the penalty µ. That is we will define µ 0 < µ 1 <... < and iteratively solve the problem for each of those µ s, each time using the previous solution as an initial guess for the next problem. We will stop when µ is large enough such that the constraints are reasonably fulfilled. That is, for our largest value of µ, it may be that for example one of the equality constraints satisfies c(x ) = ε. Whether ε is small enough or not will depend on the application that yielded the optimization problem. Remark 1. Another popular choice for a penalty is the exact penalty function ρ(x) = x (absolute value). It is popular because unlike the quadratic function, the constraints will be exactly fulfilled for some moderate µ and not only for µ. The downside here is that this function is non-smooth and significantly complicates the optimization process. We will not consider this choice further in this chapter. Example 9. Consider again the following problem ( min x ( + x 2 x R 2) 1 4 s.t. 8) n x 1 + x x 1 x x 1 + x x 1 x (23) Previously, we had four inequality constraints, and only the first two where active at the solution. Using the Lagrange multipliers approach we had to check several combinations of active/inactive constraints. We will now use the penalty version of this approach using ρ = x 2, which is vector form becomes ( min f µ(x) = x ( + x 2 x R 2) 1 ) 4 + µ max{ax 1, 0} 2 n 2 8, (24)

40 36 3. Constrained Optimization where the matrix A is A = We will solve this problem using Steepest Descent (SD), and the gradient for SD is given by [ f µ = ] 2(x 1 3) 2 + µa (max{ax 1, 0} (Ax 1)), 4(x )3 where is the Hadamard point-wise vector product (same as the operator.* is Julia/Matlab).

41 37 3. Constrained Optimization using PyPlot; close("all"); xx = -0.5:0.01:1.5 yy= -1.0:0.01:1.0; m = length(xx); X = repmat(xx,1,length(yy)) ; Y = repmat(yy,1,length(xx)); ovec = ones(4); A = [1.0 1 ; 1-1 ; -1 1 ; -1-1]; f = (x,mu)->(x[1]-3/2)^2 + (x[2]-1/8)^4 + mu*norm(max(a*x - ovec,0.0)).^2; g = (x,mu)->[2*(x[1]-3/2) ; 4*(x[2]-1/8).^3] + mu*2*a *(((A*x- ovec).>0.0).*(a*x - ovec)); F = (X-3/2).^2 + (Y-1/8).^4 C = (max(x + Y - 1,0).^2 + max(-x + Y - 1,0).^2 + max(-x - Y - 1,0).^2 + max(x - Y - 1,0).^2); figure(); mu = 0.0; Fc = F + mu*c; subplot(3,2,1); contour(x,y,fc,200); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("penalty for mu = ",mu)) mu = 0.01; Fc = F + mu*c; subplot(3,2,2); contour(x,y,fc,200); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("penalty for mu = ",mu)) mu = 0.1; Fc = F + mu*c; subplot(3,2,3); contour(x,y,fc,200); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("penalty for mu = ",mu)) mu = 1; Fc = F + mu*c; subplot(3,2,4); contour(x,y,fc,200); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("penalty for mu = ",mu)) mu = 10; Fc = F + mu*c; subplot(3,2,5); contour(x,y,fc,500); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("penalty for mu = ",mu)) mu = 100; Fc = F + mu*c; subplot(3,2,6); contour(x,y,fc,1000); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("penalty for mu = ",mu))

42 38 3. Constrained Optimization Figure 9: The influence of the penalty parameter on the objective. As µ grows, the feasible set is more and more obvious in the function values.

43 39 3. Constrained Optimization ## Armijo parameters: alpha0 = 1.0; beta = 0.5; c = 1e-1; ## See the Armijo linesearch function in previous examples. ## Starting point x_sd = [-0.5;0.8]; muvec = [0.1; 1.0 ; 10.0 ; 100.0] fvals = []; cvals = [];figure(); ## SD Iterations for jmu = 1:length(muVec) mu = muvec[jmu] fmu = (x)->f(x,mu); println("mu = ",mu) Fc = F + mu*c; subplot(2,2,jmu); contour(x,y,fc,1000); #hold on; axis image; xlabel("x"); ylabel("y"); title(string("sd for mu = ",mu)) for k=1:50 gk = g(x_sd,mu); if norm(gk) < 1e-3 break; end d_sd = -gk; x_prev = copy(x_sd); alpha_sd = linesearch(fmu,x_sd,d_sd,gk,0.5,beta,c); x_sd=x_sd+alpha_sd*d_sd; plot([x_sd[1];x_prev[1]],[x_sd[2];x_prev[2]],"g",linewidth=2.0); println(x_sd,",",norm(gk),",",alpha_sd,",",f(x_sd,mu)); fvals = [fvals;f(x_sd,mu)]; cvals = [cvals;norm(max(a*x_sd - ovec,0.0))] end; end figure();subplot(1,2,1); plot(fvals);title("function values") subplot(1,2,2); semilogy(cvals,"*");title("constraints violation values max(ax-1,0) ") Barrier methods We will now see the sister of the penalty method which offers a different penalty approach. This approach is relevant only for inequality constraints, and is used when we wish to require that the iterates will absolutely be inside the feasible domain. The idea is to choose a barrier function that goes to as we move towards the boundaries of the feasible domain. Unlike the penalty approach we do not even define the objective outside the feasible domain, and do not allow our iterates to go there. There are two main barrier functions : the log-barrier function and the inverse-barrier

44 40 3. Constrained Optimization Figure 10: The convergence paths of the SD method for minimizing the objective for various penalty parameters. Each solution is initialized with the solution for the previous larger penalty parameter µ.

45 41 3. Constrained Optimization Figure 11: The function value and constraints violation for the SD iterations. At µ = 100 we get about three digits of accuracy for the constraints. function. Suppose we wish to have a constraint φ(x) 0, then the two following functions can be used: b log (x) = log(φ(x)), b inv (x) = 1 φ(x) Figure 12 shows an example of the two penalty functions. Using these functions, the minimizer will only be inside the domain and as we apply the iterates, we have to make sure that our next step is always inside the domain. If not, we reduce the steplength parameter until the next iterate is inside the domain (say, in an iterative matter similar to the Armijo process). Since these functions go to, the barrier method is not too sensitive to the choice of penalty parameter, but note that 1) these barrier functions are not 0 inside the domain, and hence influence the solution even if the constraint is not active. 2) These are highly non-quadratic functions with quite large high order derivatives (third derivative for example). The optimization process which works best on quadratic penalties usually struggles with these functions. 3.4 The projected Steepest Descent method The projected Steepest Descent method is an attempt to avoid all the difficulties involved with the penalty and barrier methods. In this method we assume that the iterates x (k) are inside or on the boundaries of the feasible domain. In this method we find a Steepest Descent

46 42 3. Constrained Optimization Figure 12: Two barrier functions for the constraint x 2 0. Both of the functions go to as x 0. The log function is bounded from below only in a closed region, so it is not suitable for cases where only one sided barrier is used. step y, followed by a projection of y to be inside the feasible domain. That is, we replace the y with the closest point that fulfils the constraints. This is not always an easy task, therefore this method is most useful when the constraints are simple (linear, for example). More explicitly, assume the problem min f(x) x Ω where Ω is a sub-set of R n. Given a point y, we will define the projection operator Π Ω to be the solution of the following problem: Π Ω (y) = arg min x y, x Ω for some vector norm (in most cases the squared l 2 norm is chosen). The projected SD method is defined by x (k+1) = Π Ω ( x (k) α f(x (k) ) ), where α is obtained by linesearch.

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems. Consider constraints Notes for 2017-04-24 So far, we have considered unconstrained optimization problems. The constrained problem is minimize φ(x) s.t. x Ω where Ω R n. We usually define x in terms of

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

1 Introduction

1 Introduction 2018-06-12 1 Introduction The title of this course is Numerical Methods for Data Science. What does that mean? Before we dive into the course technical material, let s put things into context. I will not

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Computational Finance

Computational Finance Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1 Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1 Session: 15 Aug 2015 (Mon), 10:00am 1:00pm I. Optimization with

More information

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Dr. Abebe Geletu Ilmenau University of Technology Department of Simulation and Optimal Processes (SOP)

More information

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series

More information

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written 11.8 Inequality Constraints 341 Because by assumption x is a regular point and L x is positive definite on M, it follows that this matrix is nonsingular (see Exercise 11). Thus, by the Implicit Function

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study International Journal of Mathematics And Its Applications Vol.2 No.4 (2014), pp.47-56. ISSN: 2347-1557(online) Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms:

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global and local convergence results

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Computational Optimization. Constrained Optimization Part 2

Computational Optimization. Constrained Optimization Part 2 Computational Optimization Constrained Optimization Part Optimality Conditions Unconstrained Case X* is global min Conve f X* is local min SOSC f ( *) = SONC Easiest Problem Linear equality constraints

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Generalization to inequality constrained problem. Maximize

Generalization to inequality constrained problem. Maximize Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information