5 Overview of algorithms for unconstrained optimization

Size: px
Start display at page:

Download "5 Overview of algorithms for unconstrained optimization"

Transcription

1 IOE 59: NLP, Winter 22 c Marina A. Epelman 9 5 Overview of algorithms for unconstrained optimization 5. General optimization algorithm Recall: we are attempting to solve the problem (P) min f(x) s.t. x 2 X where f(x) is di erentiable and X R n is an open set. Solutions to optimization problems are almost always impossible to obtain directly (or in closed form ) with a few exceptions. Hence, for the most part, we will solve these problems with iterative algorithms. These algorithms typically require the user to supply a starting point x 2 X. Beginning at x, an iterative algorithm will generate a sequence of points {x k } k= called iterates. In deciding how to generate the next iterate, x k+, the algorithms use information about the function f at the current iterate, x k, and sometimes past iterates x,...,x k. In practice, rather than constructing an infinite sequence of iterates, algorithms stop when an appropriate termination criterion is satisfied, indicating either that the problem has been solved within a desired accuracy, or that no further progress can be made. Most algorithms for unconstrained optimization we will discuss fall into the category of directional search algorithms: General directional search optimization algorithm Initialization Specify an initial guess of the solution x Iteration For k =,,..., If x k is optimal, stop Otherwise, Determine d k asearch directions Determine k > astep size Determine x k+ = x k + k d k a new estimate of the solution. 5.. Choosing the direction Typically, we require that d k is a descent direction of f at x k, that is, f(x k + d k ) <f(x k ) 8 2 (, ] for some >. For the case when f is di erentiable, we have shown in Theorem 4. that any d k such that rf(x k ) T d k < is a descent direction whenever rf(x k ) 6=. Often, direction is chosen to be of the form d k = D k rf(x k ), where D k is a positive definite symmetric matrix. (Why is it important that D k is positive definite?)

2 IOE 59: NLP, Winter 22 c Marina A. Epelman 2 The following are the two basic methods for choosing the matrix D k at each iteration; they give rise to two classic algorithms for unconstrained optimization we are going to discuss in class: Steepest descent D k = I, k =,, 2,... Newton s method D k = H(x k ) (provided H(x k ) is positive definite.) 5..2 Choosing the stepsize After d k is fixed, k ideally would solve the one-dimensional optimization problem min f(xk + d k ). This optimization problem is usually also impossible to solve exactly. Instead, k is computed (via an iterative procedure referred to as line search) either to approximately solve the above optimization problem, or to ensure a su cient decrease in the value of f Testing for optimality Based on the optimality conditions, x k is a locally optimal if rf(x k ) = and H(x k ) is positive definite. However, such a point is unlikely to be found. In fact, the most of the analysis of the algorithms in the above form deals with their limiting behavior, i.e., analyzes the limit points of the infinite sequence of iterates generated by the algorithm. Thus, to implement the algorithm in practice, more realistic termination criteria need to be implemented. They often hinge, at least in part, on approximately satisfying, to a certain tolerance, the first order necessary condition for optimality discussed in the previous section. 5.2 Steepest descent algorithm for minimization The steepest descent algorithm is a version of the general optimization algorithm that chooses d k = rf(x k ) at the kth iteration. As a source of motivation, note that f(x) can be approximated by its linear expansion f( x + d) f( x)+rf( x) T d. It is not hard to see that so long as rf( x) 6=, the direction d = rf( x) krf( x)k = rf( x) p rf( x) T rf( x) minimizes the above approximation over all direction of unit length. Indeed, for any direction d with kdk =, the Schwartz inequality yields rf( x) T d krf( x)k kdk = krf( x)k = rf( x) T d. Of course, if rf( x) =, then x is a candidate for local minimizer, i.e., x satisfies the first order necessary optimality condition. The direction d = rf( x) is called the direction of steepest descent at the point x. Note that d = rf( x) is a descent direction as long as rf( x) 6=. To see this, simply observe that d T rf( x) = rf( x) T rf( x) < so long as rf( x) 6=. A natural consequence of this is the following algorithm, called the steepest descent algorithm.

3 IOE 59: NLP, Winter 22 c Marina A. Epelman 2 Step Given x,setk Step d k = rf(x k ). If d k =, then stop. Step 2 Steepest Descent Algorithm: Choose stepsize k by performing an exact (or inexact) line search. Step 3 Set x k+ x k + k d k, k k +. Go to Step. Note from Step 2 and the fact that d k = rf(x k ) is a descent direction, it follows that f(x k+ ) < f(x k ). The following theorem establishes that under certain assumptions on f, the steepest descent algorithm converges regardless of the initial starting point x (i.e., it exhibits global convergence). Theorem 5. (Convergence Theorem; Steepest Descent with exact line search) Suppose that f : R n! R is continuously di erentiable on the set S = {x 2 R n : f(x) apple f(x )}, andthat S is a closed and bounded set. Suppose further that the sequence {x k } is generated by the steepest descent algorithm with stepsizes k chosen by an exact line search. Then every point x that is a limit point of the sequence {x k } satisfies rf( x) =. Proof: The proof of this theorem is by contradiction. By the Weierstrass Theorem, at least one limit point of the sequence {x k } must exist. Let x be any such limit point. Without loss of generality, assume that lim k! x k = x, but that rf( x) 6=. This being the case, there is a value 4 of > such that = f( x) f( x + d) >, where d = rf( x). Then also ( x + d) 2 ints, because f( x + d) <f( x) apple f(x ). Let {d k } be the sequence of directions generated by the algorithm, i.e., d k = rf(x k ). Since f is continuously di erentiable, lim k! d k = d. Thensince( x+ d) 2 ints, and (x k + d k )! ( x+ d), for k su ciently large we have x k + d k 2 S and f(x k + d k ) apple f( x + d)+ = f( x) + = f( x) 2 2 However, f( x) apple f(x k + k d k ) apple f(x k + d k ) apple f( x) 2, which is, of course, a contradiction. Thus d = rf( x) =. An example Suppose, f(x) is a simple quadratic function of the form: 2. f(x) = 2 xt Qx + q T x, where Q is a positive definite symmetric matrix. The optimal solution of (P) is easily computed as: x? = Q q (since Q is positive definite, it is non-singular) and direct substitution shows that the optimal objective function value is: f(x? )= 2 qt Q q.

4 IOE 59: NLP, Winter 22 c Marina A. Epelman 22 For convenience, let x denote the current point in the steepest descent algorithm. We have: f(x) = 2 xt Qx + q T x and let d denote the current direction, which is the negative of the gradient, i.e., d = rf(x) = Qx q. Now let us compute the next iterate of the steepest descent algorithm. If is the generic stepsize, then f(x + d) = 2 (x + d)t Q(x + d)+q T (x + d) = 2 xt Qx + d T Qx d T Qd + q T x + q T d = f(x) d T d d T Qd. Optimizing the value of in this last expression yields and the next iterate of the algorithm then is = dt d d T Qd, x = x + d = x + dt d d T d, where d = Qx q. Qd and f(x )=f(x + d) =f(x) d T d d T Qd = f(x) (d T d) 2 2 d T Qd. Suppose that Then and so and Q = rf(x) = x? = and q = x x 2 f(x? )= Suppose that x =(, ). Then we have: x =(.4,.4), x 2 =(,.8), etc., and the even numbered iterates satisfy x 2n =(,.2 n ) and f(x 2n )=(.2 n ) (.2) n

5 IOE 59: NLP, Winter 22 c Marina A. Epelman 23 and so kx 2n x? k =.2 n, f(x 2n ) f(x? )=(.2) 2n. Therefore, starting from the point x =(, ), distance from the current iterate to the optimal solution goes down by a factor of.2 after every two iterations of the algorithm (a similar observation can be made about the progress of the objective function values). The graph below plots the progress of the sequence kx k x? k as a function of iteration number; notice that the y-axis is drawn on a logarithmic scale this allows us to visualize the progress of the algorithm better as values of kx k x? k approach zero. Although it is easy to find the optimal solution of the quadratic optimization problem in closed form, the above example is relevant in that it demonstrates a typical performance of the steepest descent algorithm. Additionally, most functions behave as near-quadratic functions in a neighborhood of the optimal solution, making the example even more relevant. Termination criteria Ideally, the algorithm will terminate at a point x k such that rf(x k ) =. However, the algorithm is not guaranteed to be able to find such point in finite amount of time. Moreover, due to rounding errors in computer calculations, the calculated value of the gradient will have some imprecision in it. Therefore, in practical algorithms the termination criterion is designed to test if the above condition is satisfied approximately, so that the resulting output of the algorithm is an approximately optimal solution. A natural termination criterion for the steepest descent could be krf(x k )kapple, where > is a pre-specified tolerance. However, depending on the scaling of the function, this requirement can be either unnecessarily stringent, or too loose to ensure near-optimality (consider a problem concerned with minimizing distance, where the objective function can be expressed in inches, feet, or miles). Another alternative, that might alleviate the above consideration, is to terminate when krf(x k )kapple f(x k ) this, however, may lead to problems when the objective function at the optimum is zero. A combined approach is then to terminate when krf(x k )kapple ( + f(x k ) ). The value of is typically taken to be at most the square root of the machine tolerance (e.g., = 8 if 6-digit computing is used), due to the error incurred in estimating derivatives.

6 IOE 59: NLP, Winter 22 c Marina A. Epelman Stepsize selection In the analysis in the above subsection we assumed that one-dimensional optimization problem invoked in the line search in each iteration of the Steepest Descent algorithm was performed exactly and with perfect precision, which is usually not possible. In this subsection we discuss one of the many practical ways of solving this problem approximately, to determine the stepsize at each iteration of the general directional search optimization algorithm (including steepest descent) Stepsize selection basics Suppose that f(x) is a continuously di erentiable function, and that we seek to (approximately) solve: = arg min f( x + d), > where x is our current iterate, and d is the current direction generated by an algorithm that seeks to minimize f(x). We assume that d is a descent direction, i.e., rf( x) T d <. Let F ( ) =f( x + d), whereby F ( ) is a function in the scalar variable, and our problem is to solve for = arg min > F ( ). Using the chain rule for di erentiation, we can show that F ( ) =rf( x + d) T d. Therefore, applying the necessary optimality conditions to the one-dimensional optimization problem above, we want to find a value for which F ( ) =. Furthermore, since d is a descent direction, F () < Armijo rule, or backtracking Although there are iterative algorithms developed to solve the problem min F ( ) (or F ( ) = ) exactly, i.e., with a high degree of precision (such as, for instance, bisection search algorithm), they are typically too expensive computationally. (Recall that we need to perform a line search at every iteration of our steepest optimization algorithm!) On the other hand, if we sacrifice accuracy of the line search, this can cause inferior performance of the overall algorithm. The Armijo rule, or the backtracking method, is one of several inexact line search methods which guarantees a su cient degree of improvement in the objective function to ensure the algorithm s convergence. Armijo rule requires two parameters: <µ<.5 and < <. Suppose we are minimizing a function F ( ) such that F () < (which is indeed the case for the line search problems arising in descent algorithms). Then the first order approximation of F ( ) at = is given by F ()+ F (). Define ˆF ( ) =F ()+µ F () (see figure). A stepsize is considered acceptable by Armijo rule only if F ( ) apple ˆF ( ), that is, if taking a step of size guarantees su cient decrease of the function: f( x + d) f( x) apple µ rf( x) T d.

7 IOE 59: NLP, Winter 22 c Marina A. Epelman F()+µ F().2 F( )...2 F()+ F() Note that the su cient decrease condition will hold for any small value of. On the other hand, we would like to prevent the step size from being too small, for otherwise our overall optimization algorithm would not be making much progress. To combine these two considerations, will implement the following iterative backtracking procedure (here we use = 2 ): Step Set k=. =. Backtracking line search Step k If F ( k ) apple ˆF ( k ), choose k as the step size; stop. If F ( k ) > ˆF ( k ), let k+ 2 k, k k +. Note that as a result of the above iterative scheme, the chosen stepsize is = 2,wheret t the smallest integer such that F (/2 t ) apple ˆF (/2 t ) (or, for general, F ( t ) apple ˆF ( t )). Typically, µ is chosen in the range between. and.3, and between. to.8. Note that if x k and x k+ are the consecutive iterates of the general optimization algorithm with d k a descent direction, and the stepsizes chosen by backtracking, then f(x k+ ) apple f(x k ) that is, the algorithm is guaranteed to produce an improvement in the function value at every iteration. Under additional assumptions on f, it can be also shown that the steepest descent algorithm will demonstrate global convergence properties under the Armijo line search rule, as stated in the following theorem. Theorem 5.2 (Convergence Theorem; Steepest Descent with backtracking line search) Suppose that the set S = {x 2 R n : f(x) apple f(x )} is closed and bounded, and suppose that the gradient of f is Lipschitz continuous on the set S, i.e., there exist a constant G> such that krf(x) rf(y)k applegkx yk 8x, y 2 S. Suppose further that the sequence {x k } is generated by the steepest descent algorithm with stepsizes k chosen by a backtracking line search. Then every point x that is a limit point of the sequence {x k } satisfies rf( x) =. The additional assumption, basically, ensures that the gradient of f does not change too rapidly. In the proof of the theorem, this allows to provide a lower bound on the stepsize in each iteration. (See any of the reference textbooks for details.) Remark: Our discussion so far implicitly assumed that the domain of the optimization problem was the entire R n. If our optimization problem is (P) min f(x) s.t.x 2 X, is

8 IOE 59: NLP, Winter 22 c Marina A. Epelman 26 where X is an open set, then the line-search problem is min f( x + d) s.t. x + d 2 X. In this case, we must ensure that all iterate values of in the backtracking algorithm satisfy x + d 2 X. As an example, consider the following problem: P (P) min f(x) := m ln(b i a T i x) s.t. b Ax >. Here the domain of f(x) isx = {x 2 R n : b the line-search problem is: i= (LS) min h( ) :=f( x + d) P = m ln(b i s.t. b A( x + d) >. Standard arithmetic manipulation can be used to establish that Ax > }. Given a point x 2 X and a direction d, i= a T i ( x + d)) b A( x + d) > if and only if ˇ < <ˆ, where ˇ := bi min a T d< i and the line-search problem then is: a T i x a T i d P LS : minimize h( ) := m ln(b i s.t. < <ˆ. bi a T i and ˆ := min x a T d> a T d i i i= a T i ( x + d)), The implementation of the backtracking rule for this problem would have to be modified starting with =, we will backtrack, if necessary, until alpha < ˆ, and only then start checking the su cient decrease conditions. 5.4 Newton s method for minimization Again, we want to solve (P) min f(x) x 2 R n. The Newton s method can also be interpreted in the framework of the general optimization algorithm, but it truly stems from the Newton s method for solving systems of nonlinear equations. Recall that if : R n! R n, to solve the system of equations (x) =, one can apply an iterative method. Starting at a point x, approximate the function by ( x+d) ( x)+r ( x) T d,wherer ( x) T 2 R n n is the Jacobian of at x, and provided that r ( x) is nonsingular, solve the system of linear equations r ( x) T d = ( x)

9 IOE 59: NLP, Winter 22 c Marina A. Epelman 27 to obtain d. Set the next iterate x = x + d, and continue. This method is well-studied, and is wellknown for its good performance when the starting point x is chosen appropriately. The Newton s method for minimization is precisely an application of this equation-solving method to the (system of) first-order optimality conditions rf(x) =. Here is another view of the motivation behind the Newton s method for optimization. At x = x, f(x) can be approximated by f(x) q(x) 4 = f( x)+rf( x) T (x x)+ 2 (x x)t H( x)(x x), which is the quadratic Taylor expansion of f(x) at x = x. q(x) is a quadratic function which is minimized by solving rq(x) =, i.e., rf( x)+h( x)(x x) =, which yields x x = H( x) rf( x). The direction H( x) rf( x) is called the Newton direction, or the Newton step. This leads to the following algorithm for solving (P): Step Given x,setk Newton s Method: Step d k = H(x k ) rf(x k ). If d k =, then stop. Step 2 Choose stepsize k =. Step 3 Set x k+ x k + k d k, k k +. Go to Step. Proposition 5.3 If H(x) is p.d., then d = H(x) rf(x) is a descent direction. Proof: It is su cient to show that rf(x) T d = rf(x) T H(x) rf(x) <. Since H(x) is positive definite, if v 6=, < (H(x) v) T H(x)(H(x) v)=v T H(x) v, completing the proof. Note that: Work per iteration: O(n 3 ) The iterates of Newton s method are, in general, equally attracted to local minima and local maxima. Indeed, the method is just trying to solve the system of equations rf(x) =. The method assumes H(x k ) is nonsingular at each iteration. positive definite, d k is not guaranteed to be a descent direction. There is no guarantee that f(x k+ ) apple f(x k ). Moreover, unless H(x k )is Step 2 could be augmented by a linesearch of f(x k + d k ) over the value of ; then previous consideration would not be an issue. What if H(x k ) becomes increasingly singular (or not positive definite)? Use H(x k )+ I. In general, points generated by the Newton s method as it is described above, may not converge. For example, H(x k ) may not exist. Even if H(x) is always non-singular, the method may not converge, unless started close enough to the right point.

10 IOE 59: NLP, Winter 22 c Marina A. Epelman 28 Example : Let f(x) =7x ln(x). Then rf(x) =f (x) =7 x and H(x) =f (x) =. It is not x 2 hard to check that x? = 7 = is the unique global minimizer. The Newton direction at x is d = H(x) rf(x) = f (x) f (x) = x2 7 = x 7x 2, x and is defined so long as x>. So, Newton s method will generate the sequence of iterates {x k } with x k+ = x k +(x k 7(x k ) 2 )=2x k 7(x k ) 2. Below are some examples of the sequences generated by this method for di erent starting points: k x k x k x k (note that the iterate in the first column is not in the domain of the objective function, so the algorithm has to terminate with an error). Below is a plot of the progress of the algorithm as a function of iteration number (for the two sequences that did converge): Example 2: f(x) = ln( x x 2 ) ln x ln x 2. rf(x) = apple x x 2 x, x x 2 x 2

11 IOE 59: NLP, Winter 22 c Marina A. Epelman H(x) = 4 x x 2 + x x x x x 2 x x 2 + x 2 x? = 3, 3, f(x? )= k x k x k 2 kx k xk e e e 6 Termination criteria Since Newton s method is working with the Hessian as well as the gradient, it would be natural to augment the termination criterion we used in the Steepest Descent algorithm with the requirement that H(x k ) is positive semi-definite, or, taking into account the potential for the computational errors, that H(x k )+ I is positive semi-definite for some > (this parameter may be di erent than the one used in the condition on the gradient).

12 IOE 59: NLP, Winter 22 c Marina A. Epelman Comparing performance of the steepest descent and Newton algorithms 5.5. Rate of convergence Suppose we have a converging sequence lim k! s k = s, and we would like to characterize the speed, or rate, at which the iterates s k approach the limit s. A converging sequence of numbers {s k } exhibits linear convergence if for some apple C<, s k+ s lim = C. k! s k s C in the above expression is referred to as the rate constant; if C =, the sequence exhibits superlinear convergence. A converging sequence of numbers {s k } exhibits quadratic convergence if s k+ s lim k! s k s 2 = <. Examples: Linear convergence s k = k :.,.,., etc. s =. s k+ s s k s =.. Superlinear convergence s k =. k! :, 2, 6, 24, 25,etc. s =. s k+ s s k s = k! (k + )! =! as k!. k + Quadratic convergence s k = (2 k ) :.,.,.,., etc. s =. s k+ s s k s 2 = (2k ) 2 =. 2k This illustration compares the rates of convergence of the above sequences (note that the y-axis is displayed on the logarithmic scale):

13 IOE 59: NLP, Winter 22 c Marina A. Epelman 3 We will use the notion of rate of convergence to analyze one aspect of performance of optimization algorithms. Indeed, since an algorithm for nonlinear optimization problems, in its abstract form, generates an infinite sequence of points {x k } converging to a solution x only in the limit, it makes sense to discuss the rate of convergence of the sequence ke k k = kx k xk, or E k = f(x k ) f( x), which both have limit Rate of convergence of the steepest descent algorithm for the case of a quadratic function In this section we explore answers to the question of how fast the steepest descent algorithm converges. Recall that in the earlier example we observed linear convergence of both the sequence {E k } and {e k }. We will show now that the steepest descent algorithm with stepsizes selected by exact line search in general exhibits linear convergence, but that the rate constant depends very much on the ratio of the largest to the smallest eigenvalue of the Hessian matrix H(x) at the optimal solution x = x?. In order to see how this dependence arises, we will examine the case where the objective function f(x) is itself a simple quadratic function of the form: f(x) = 2 xt Qx + q T x, where Q is a positive definite symmetric matrix. We will suppose that the eigenvalues of Q are A = a a 2... a n = a>, i.e, A and a are the largest and smallest eigenvalues of Q. We already derived that the optimal solution of (P) is x? = Q q with the optimal objective function value is: f(x? )= 2 qt Q q.

14 IOE 59: NLP, Winter 22 c Marina A. Epelman 32 Moreover, if x is the current point in the steepest descent algorithm, then f(x) = 2 xt Qx + q T x, and the next iterate of the steepest descent algorithm with exact line search is x = x + d = x + dt d d T Qd d, where d = rf(x) and f(x )=f(x) d T d d T Qd = f(x) (d T d) 2 2 d T Qd. Therefore, (d T d) 2 2 f(x ) f(x? ) f(x) f(x? ) = f(x) f(x? ) d T Qd f(x) f(x? ) = = = (d T d) 2 2 d T Qd 2 xt Qx + q T x + 2 qt Q q (d T d) 2 2 d T Qd 2 (Qx + q)t Q (Qx + q) (d T d) 2 (d T Qd)(d T Q d) = where = (dt Qd)(d T Q d) (d T d) 2. In order for the convergence constant to be good, which will translate to fast linear convergence, we would like the quantity to be small. The following result provides an upper bound on the value of. Kantorovich Inequality: Let A and a be the largest and the smallest eigenvalues of Q, respectively. Then (A + a)2 apple. 4Aa We will skip the proof of this inequality. Continuing, we have Let us apply this inequality to the above analysis. f(x ) f(x? ) f(x) f(x? ) = 4Aa (A a)2 apple = (A + a) 2 (A + a) 2 = A/a 2. A/a + Note by definition that A/a is always at least. If A/a is small (not much bigger than ), then the convergence constant will be much smaller than. However, if A/a is large, then the convergence constant will be only slightly smaller than. The following table shows some sample values:

15 IOE 59: NLP, Winter 22 c Marina A. Epelman 33 Upper Bound on Number of Iterations to Reduce A a the Optimality Gap by Note that the number of iterations needed to reduce the optimality gap by. grows linearly in the ratio A/a. Two pictures of possible iterations of the steepest descent algorithm are as follows:

16 IOE 59: NLP, Winter 22 c Marina A. Epelman 34 Some remarks: We analyzed the convergence of the function values; the convergence of the algorithm iterates can be easily shown to be linear with the same rate constant. The bound on the rate of convergence is attained in practice quite often, which is unfortunate. The ratio of the largest to the smallest eigenvalue of a matrix is called the condition number of the matrix. What about non-quadratic functions? If the Hessian at the locally optimal solution is positive definite, the function behaves as near-quadratic function in a neighborhood of that solution. The convergence exhibited by the iterates of the steepest descent algorithm will also be linear. The analysis of the non-quadratic case gets very involved; fortunately, the key intuition is obtained by analyzing the quadratic case. What about backtracking line search? Also linear convergence! (The rate constant depends in part on the backtracking parameters.) Rate of convergence of the pure Newton s method We have seen from our examples that, even for convex functions, the Newton s method in its pure form (i.e., with stepsize of at every iteration) does not guarantee descent at each iteration, and may produce a diverging sequence of iterates. Moreover, each iteration of the Newton s method is much more computationally intensive then that of the steepest descent. However, under certain conditions, the method exhibits quadratic rate of convergence, making it the ideal method for solving convex optimization problems. Recall that a method exhibits quadratic convergence when ke k k = kx k xk! and ke k+ k lim k! ke k k 2 = C. Roughly speaking, if the iterates converge quadratically, the accuracy (i.e., the number of correct digits) of the solution doubles in a fixed number of iterations. There are many ways to state and prove results regarding the convergence on the Newton s method. We provide one that provides a particular insight into the circumstances under which pure Newton s method demonstrates quadratic convergence. Let kvk denote the usual Euclidian norm of a vector, namely kvk := p v T v. Recall that the operator norm of a matrix M is defined as follows: kmk := max{kmxk : kxk =}. x As a consequence of this definition, for any x, kmxkapplekmk kxk. Theorem 5.4 (Quadratic convergence) Suppose f(x) is twice continuously di erentiable and x? is a point for which rf(x? )=.SupposeH(x) satisfies the following conditions: there exists a scalar h> for which k[h(x? )] kapple h there exists scalars > and L> for which kh(x) H(y)k applelkx yk for all x and y satisfying kx x? kapple and ky x? kapple.

17 IOE 59: NLP, Winter 22 c Marina A. Epelman 35 Let x satisfy kx x? kapple, where < < and := min, 2h 3L, and let x N := x H(x) rf(x). Then: (i) kx N x? kapplekx x? k 2 L 2(h Lkx x? k) (ii) kx N x? k < kx x? k, and hence the iterates converge to x? (iii) kx N x? kapplekx x? k 2 3L 2h. The proof relies on the following two elementary facts. Proposition 5.5 Suppose that M is a symmetric matrix. Then the following are equivalent:. h> satisfies km kapple h 2. h> satisfies kmvk h kvk for any vector v Proposition 5.6 Suppose that f(x) is twice di erentiable. Then rf(z) rf(x) = Z [H(x + t(z x))] (z x)dt. Proof: Let (t) := rf(x + t(z x)). Then () = rf(x) and () = rf(z), and (t) = [H(x + t(z x))] (z x). From the fundamental theorem of calculus, we have: Proof of Theorem 5.4 We have: Therefore rf(z) rf(x) = () () x N x? = x H(x) rf(x) x? = = Z Z (t)dt = x x? + H(x) (rf(x? ) rf(x)) [H(x + t(z x))] (z x)dt. Z = x x? + H(x) [H(x + t(x? x))] (x? x)dt (from Proposition 5.6) Z = H(x) [H(x + t(x? x)) H(x)] (x? x)dt. kx N x? kapplekh(x) k applekx? Z xk kh(x) k = kx? xk 2 kh(x) kl k [H(x + t(x? x)) H(x)] k k(x? x)kdt Z = kx? xk 2 kh(x) kl. 2 Z L t k(x? tdt x)kdt

18 IOE 59: NLP, Winter 22 c Marina A. Epelman 36 We now bound kh(x) k. Let v be any vector. Then kh(x)vk = kh(x? )v +(H(x) H(x? ))vk kh(x? )vk k(h(x) H(x? ))vk h kvk kh(x) H(x? )kkvk (from Proposition 5.5) h kvk Lkx? xk kvk =(h Lkx? xk) kvk. Invoking Proposition 5.5 again, we see that this implies that Combining this with the above yields kh(x) kapple h Lkx? xk. kx N x? kapplekx? xk 2 L 2(h Lkx? xk), which is (i) of the theorem. Because Lkx? xkapple 2h 3 < 2h 3 we have: kx N x? kapplekx? Lkx? xk xk 2(h Lkx? xk) apple 2h 3 2h 2 h 3 which establishes (ii) of the theorem. Finally, we have kx? xk = kx? xk, kx N x? kapplekx? xk 2 L 2(h Lkx? xk) applekx? xk 2 L 2 h 2h 3 = kx? xk 2 3L 2h, which establishes (iii) of the theorem. Notice that the results regarding the convergence and rate of convergence in the above theorem are local, i.e., they apply only if the algorithm is initialized at certain starting points (the ones su ciently close to the desired limit). In practice, it is not known how to pick such starting points, or to check if the proposed starting point is adequate. (With the very important exception of self-concordant functions.) 5.6 Further discussion and modifications of the Newton s method 5.6. Global convergence for strongly convex functions with a two-phase Newton s method We have noted that, to ensure descent at each iteration, the Newton s method can be augmented by a line search. This idea can be formalized, and the e ciency of the resulting algorithm can be analyzed (see, for example, Convex Optimization by Stephen Boyd and Lieven Vandenberghe, available at for a fairly simple presentation of the analysis). Suppose that f(x) isstrongly convex on it domain, i.e., assume there exists µ > such that the smallest eigenvalue of H(x) is greater than equal to µ for all x and that the Hessian is Lipschitz continuous everywhere on the domain of f. Suppose we apply the Newton s method with the

19 IOE 59: NLP, Winter 22 c Marina A. Epelman 37 stepsize at each iteration determined by the backtracking procedure of section That is, at each iteration of the algorithm we first attempt to take a full Newton step, but reduce the stepsize if the decrease in the function value is not su cient. Then there exist positive numbers and such that if krf(x k )k, thenf(x k+ ) f(x k ) apple, and if krf(x k )k <,thenstepsize k = will be selected, and the next iterate will satisfy krf(x k+ )k <, and so will all the further iterates. Moreover, quadratic convergence will be observed in this phase. As hinted above, the algorithm will proceed in two phases: while the iterates are far from the minimizer, a dampening of the Newton step will be required, but there will be a guaranteed decrease in the objective function values. This phase (referred to as dampened Newton phase ) cannot take more than f(x ) f(x?) iterations. Once the norm of the gradient becomes su ciently small, no dampening of the Newton step will required in the rest of the algorithm, and quadratic convergence will be observed, thus making it the quadratically convergence phase. Note that it is not necessary to know the values of and to apply this version of the algorithm! The two-phase Newton s method is globally convergent; however, to ensure global convergence, the function being minimized needs to posses particularly nice global properties Other modifications of the Newton s method We have seen that if Newton s method is initialized su ciently close to the point x such that rf( x) = and H( x) is positive definite (i.e., x is a local minimizer), then it will converge quadratically, using stepsizes of =. There are three issues in the above statement that we should be concerned with: What if H( x) is singular, or nearly-singular? How do we know if we are close enough, and what to do if we are not? Can we modify Newton s method to guarantee global convergence? In the previous subsection we assumed away the first issue, and, under an additional assumption, showed how to address the other two. What if the function f is not strongly convex, and H(x) may approach singularity? There are two popular approaches (which are actually closely related) to address these issues. The first approach ensures that the method always uses a descent direction. For example, instead of the direction H(x k ) rf(x k ), use the direction (H(x k )+ k I) rf(x k ), where k is chosen so that the smallest eigenvalue of H(x k )+ k I is bounded below by a fixed number >. It is important to choose the value of appropriately if it is chosen to be too small, the matrix employed in computing the direction can become ill-conditioned if H( x) is nearly singular; if it is chosen to be too large, the direction becomes nearly that of the steepest descent algorithm, and hence only linear convergence can be guaranteed. Hence, the value of k is often chosen dynamically. The second approach is the so-called trust region method. Note that the main idea behind the Newton s method is to represent the function f(x) by its quadratic approximation q k (x) =f(x k )+

20 IOE 59: NLP, Winter 22 c Marina A. Epelman 38 rf(x k ) T (x x k )+ 2 (x xk ) T H(x k )(x x k ) around the current iterate, and then minimize that approximation. While locally the approximation works quite well, this may no longer be the case when a large step is taken. The trust region methods hence find the next iterate by solving the following constrained optimization problem: min q k (x) s.t.kx x k kapple k, i.e., not allowing the next iterate to be outside the neighborhood of x k where the quadratic approximation is close to the original function f(x) (as it turns out, this problem is not much harder to solve than the unconstrained minimization of q k (s)). The value of k is set to represent the size of the region in which we can trust q k (x) toprovide a good approximation of f(x). Smaller values of k ensure that we are working with an accurate representation of f(x), but result in conservative steps. Larger values of k allow for larger steps, but may lead to inaccurate estimation of the objective function. To account for this, the value if k is updated dynamically throughout the algorithm, namely, it is increased if it is observed that q k (x) provided an exceptionally good approximation of f(x) at the previous iteration, and decreased is the approximation was exceptionally bad.

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

8 Barrier Methods for Constrained Optimization

8 Barrier Methods for Constrained Optimization IOE 519: NL, Winter 2012 c Marina A. Epelman 55 8 Barrier Methods for Constrained Optimization In this subsection, we will restrict our attention to instances of constrained problem () that have inequality

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Unconstrained minimization: assumptions

Unconstrained minimization: assumptions Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization 5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The

More information

Notes on Constrained Optimization

Notes on Constrained Optimization Notes on Constrained Optimization Wes Cowan Department of Mathematics, Rutgers University 110 Frelinghuysen Rd., Piscataway, NJ 08854 December 16, 2016 1 Introduction In the previous set of notes, we considered

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

SYSTEMS OF NONLINEAR EQUATIONS

SYSTEMS OF NONLINEAR EQUATIONS SYSTEMS OF NONLINEAR EQUATIONS Widely used in the mathematical modeling of real world phenomena. We introduce some numerical methods for their solution. For better intuition, we examine systems of two

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

1-D Optimization. Lab 16. Overview of Line Search Algorithms. Derivative versus Derivative-Free Methods

1-D Optimization. Lab 16. Overview of Line Search Algorithms. Derivative versus Derivative-Free Methods Lab 16 1-D Optimization Lab Objective: Many high-dimensional optimization algorithms rely on onedimensional optimization methods. In this lab, we implement four line search algorithms for optimizing scalar-valued

More information

IOE 511/Math 652: Continuous Optimization Methods, Section 1

IOE 511/Math 652: Continuous Optimization Methods, Section 1 IOE 511/Math 652: Continuous Optimization Methods, Section 1 Marina A. Epelman Fall 2007 These notes can be freely reproduced for any non-commercial purpose; please acknowledge the author if you do so.

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Unconstrained optimization I Gradient-type methods

Unconstrained optimization I Gradient-type methods Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa www.di.unipi.it/~frangio frangio@di.unipi.it Computational Mathematics for Learning

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

4 damped (modified) Newton methods

4 damped (modified) Newton methods 4 damped (modified) Newton methods 4.1 damped Newton method Exercise 4.1 Determine with the damped Newton method the unique real zero x of the real valued function of one variable f(x) = x 3 +x 2 using

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

More information

Lecture 3: Linesearch methods (continued). Steepest descent methods

Lecture 3: Linesearch methods (continued). Steepest descent methods Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

CHAPTER 2: QUADRATIC PROGRAMMING

CHAPTER 2: QUADRATIC PROGRAMMING CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,

More information

Calculus Example Exam Solutions

Calculus Example Exam Solutions Calculus Example Exam Solutions. Limits (8 points, 6 each) Evaluate the following limits: p x 2 (a) lim x 4 We compute as follows: lim p x 2 x 4 p p x 2 x +2 x 4 p x +2 x 4 (x 4)( p x + 2) p x +2 = p 4+2

More information

Computational Finance

Computational Finance Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Nonlinear Optimization

Nonlinear Optimization Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Multidisciplinary System Design Optimization (MSDO)

Multidisciplinary System Design Optimization (MSDO) Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential

More information

Why should you care about the solution strategies?

Why should you care about the solution strategies? Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

ECE580 Partial Solution to Problem Set 3

ECE580 Partial Solution to Problem Set 3 ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Trajectory-based optimization

Trajectory-based optimization Trajectory-based optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 6 1 / 13 Using

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

14. Nonlinear equations

14. Nonlinear equations L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine

More information

Nonlinear Programming (NLP)

Nonlinear Programming (NLP) Natalia Lazzati Mathematics for Economics (Part I) Note 6: Nonlinear Programming - Unconstrained Optimization Note 6 is based on de la Fuente (2000, Ch. 7), Madden (1986, Ch. 3 and 5) and Simon and Blume

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Line Search Algorithms

Line Search Algorithms Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you

More information

Numerical Methods. King Saud University

Numerical Methods. King Saud University Numerical Methods King Saud University Aims In this lecture, we will... Introduce the topic of numerical methods Consider the Error analysis and sources of errors Introduction A numerical method which

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems AMSC 607 / CMSC 764 Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 4: Introduction to Interior Point Methods Dianne P. O Leary c 2008 Interior Point Methods We ll discuss

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

An Inexact Newton Method for Nonlinear Constrained Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions CE 191: Civil and Environmental Engineering Systems Analysis LEC : Optimality Conditions Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 214 Prof. Moura

More information

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL) Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29 Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29 You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information