5 Overview of algorithms for unconstrained optimization
|
|
- Felicia Ball
- 6 years ago
- Views:
Transcription
1 IOE 59: NLP, Winter 22 c Marina A. Epelman 9 5 Overview of algorithms for unconstrained optimization 5. General optimization algorithm Recall: we are attempting to solve the problem (P) min f(x) s.t. x 2 X where f(x) is di erentiable and X R n is an open set. Solutions to optimization problems are almost always impossible to obtain directly (or in closed form ) with a few exceptions. Hence, for the most part, we will solve these problems with iterative algorithms. These algorithms typically require the user to supply a starting point x 2 X. Beginning at x, an iterative algorithm will generate a sequence of points {x k } k= called iterates. In deciding how to generate the next iterate, x k+, the algorithms use information about the function f at the current iterate, x k, and sometimes past iterates x,...,x k. In practice, rather than constructing an infinite sequence of iterates, algorithms stop when an appropriate termination criterion is satisfied, indicating either that the problem has been solved within a desired accuracy, or that no further progress can be made. Most algorithms for unconstrained optimization we will discuss fall into the category of directional search algorithms: General directional search optimization algorithm Initialization Specify an initial guess of the solution x Iteration For k =,,..., If x k is optimal, stop Otherwise, Determine d k asearch directions Determine k > astep size Determine x k+ = x k + k d k a new estimate of the solution. 5.. Choosing the direction Typically, we require that d k is a descent direction of f at x k, that is, f(x k + d k ) <f(x k ) 8 2 (, ] for some >. For the case when f is di erentiable, we have shown in Theorem 4. that any d k such that rf(x k ) T d k < is a descent direction whenever rf(x k ) 6=. Often, direction is chosen to be of the form d k = D k rf(x k ), where D k is a positive definite symmetric matrix. (Why is it important that D k is positive definite?)
2 IOE 59: NLP, Winter 22 c Marina A. Epelman 2 The following are the two basic methods for choosing the matrix D k at each iteration; they give rise to two classic algorithms for unconstrained optimization we are going to discuss in class: Steepest descent D k = I, k =,, 2,... Newton s method D k = H(x k ) (provided H(x k ) is positive definite.) 5..2 Choosing the stepsize After d k is fixed, k ideally would solve the one-dimensional optimization problem min f(xk + d k ). This optimization problem is usually also impossible to solve exactly. Instead, k is computed (via an iterative procedure referred to as line search) either to approximately solve the above optimization problem, or to ensure a su cient decrease in the value of f Testing for optimality Based on the optimality conditions, x k is a locally optimal if rf(x k ) = and H(x k ) is positive definite. However, such a point is unlikely to be found. In fact, the most of the analysis of the algorithms in the above form deals with their limiting behavior, i.e., analyzes the limit points of the infinite sequence of iterates generated by the algorithm. Thus, to implement the algorithm in practice, more realistic termination criteria need to be implemented. They often hinge, at least in part, on approximately satisfying, to a certain tolerance, the first order necessary condition for optimality discussed in the previous section. 5.2 Steepest descent algorithm for minimization The steepest descent algorithm is a version of the general optimization algorithm that chooses d k = rf(x k ) at the kth iteration. As a source of motivation, note that f(x) can be approximated by its linear expansion f( x + d) f( x)+rf( x) T d. It is not hard to see that so long as rf( x) 6=, the direction d = rf( x) krf( x)k = rf( x) p rf( x) T rf( x) minimizes the above approximation over all direction of unit length. Indeed, for any direction d with kdk =, the Schwartz inequality yields rf( x) T d krf( x)k kdk = krf( x)k = rf( x) T d. Of course, if rf( x) =, then x is a candidate for local minimizer, i.e., x satisfies the first order necessary optimality condition. The direction d = rf( x) is called the direction of steepest descent at the point x. Note that d = rf( x) is a descent direction as long as rf( x) 6=. To see this, simply observe that d T rf( x) = rf( x) T rf( x) < so long as rf( x) 6=. A natural consequence of this is the following algorithm, called the steepest descent algorithm.
3 IOE 59: NLP, Winter 22 c Marina A. Epelman 2 Step Given x,setk Step d k = rf(x k ). If d k =, then stop. Step 2 Steepest Descent Algorithm: Choose stepsize k by performing an exact (or inexact) line search. Step 3 Set x k+ x k + k d k, k k +. Go to Step. Note from Step 2 and the fact that d k = rf(x k ) is a descent direction, it follows that f(x k+ ) < f(x k ). The following theorem establishes that under certain assumptions on f, the steepest descent algorithm converges regardless of the initial starting point x (i.e., it exhibits global convergence). Theorem 5. (Convergence Theorem; Steepest Descent with exact line search) Suppose that f : R n! R is continuously di erentiable on the set S = {x 2 R n : f(x) apple f(x )}, andthat S is a closed and bounded set. Suppose further that the sequence {x k } is generated by the steepest descent algorithm with stepsizes k chosen by an exact line search. Then every point x that is a limit point of the sequence {x k } satisfies rf( x) =. Proof: The proof of this theorem is by contradiction. By the Weierstrass Theorem, at least one limit point of the sequence {x k } must exist. Let x be any such limit point. Without loss of generality, assume that lim k! x k = x, but that rf( x) 6=. This being the case, there is a value 4 of > such that = f( x) f( x + d) >, where d = rf( x). Then also ( x + d) 2 ints, because f( x + d) <f( x) apple f(x ). Let {d k } be the sequence of directions generated by the algorithm, i.e., d k = rf(x k ). Since f is continuously di erentiable, lim k! d k = d. Thensince( x+ d) 2 ints, and (x k + d k )! ( x+ d), for k su ciently large we have x k + d k 2 S and f(x k + d k ) apple f( x + d)+ = f( x) + = f( x) 2 2 However, f( x) apple f(x k + k d k ) apple f(x k + d k ) apple f( x) 2, which is, of course, a contradiction. Thus d = rf( x) =. An example Suppose, f(x) is a simple quadratic function of the form: 2. f(x) = 2 xt Qx + q T x, where Q is a positive definite symmetric matrix. The optimal solution of (P) is easily computed as: x? = Q q (since Q is positive definite, it is non-singular) and direct substitution shows that the optimal objective function value is: f(x? )= 2 qt Q q.
4 IOE 59: NLP, Winter 22 c Marina A. Epelman 22 For convenience, let x denote the current point in the steepest descent algorithm. We have: f(x) = 2 xt Qx + q T x and let d denote the current direction, which is the negative of the gradient, i.e., d = rf(x) = Qx q. Now let us compute the next iterate of the steepest descent algorithm. If is the generic stepsize, then f(x + d) = 2 (x + d)t Q(x + d)+q T (x + d) = 2 xt Qx + d T Qx d T Qd + q T x + q T d = f(x) d T d d T Qd. Optimizing the value of in this last expression yields and the next iterate of the algorithm then is = dt d d T Qd, x = x + d = x + dt d d T d, where d = Qx q. Qd and f(x )=f(x + d) =f(x) d T d d T Qd = f(x) (d T d) 2 2 d T Qd. Suppose that Then and so and Q = rf(x) = x? = and q = x x 2 f(x? )= Suppose that x =(, ). Then we have: x =(.4,.4), x 2 =(,.8), etc., and the even numbered iterates satisfy x 2n =(,.2 n ) and f(x 2n )=(.2 n ) (.2) n
5 IOE 59: NLP, Winter 22 c Marina A. Epelman 23 and so kx 2n x? k =.2 n, f(x 2n ) f(x? )=(.2) 2n. Therefore, starting from the point x =(, ), distance from the current iterate to the optimal solution goes down by a factor of.2 after every two iterations of the algorithm (a similar observation can be made about the progress of the objective function values). The graph below plots the progress of the sequence kx k x? k as a function of iteration number; notice that the y-axis is drawn on a logarithmic scale this allows us to visualize the progress of the algorithm better as values of kx k x? k approach zero. Although it is easy to find the optimal solution of the quadratic optimization problem in closed form, the above example is relevant in that it demonstrates a typical performance of the steepest descent algorithm. Additionally, most functions behave as near-quadratic functions in a neighborhood of the optimal solution, making the example even more relevant. Termination criteria Ideally, the algorithm will terminate at a point x k such that rf(x k ) =. However, the algorithm is not guaranteed to be able to find such point in finite amount of time. Moreover, due to rounding errors in computer calculations, the calculated value of the gradient will have some imprecision in it. Therefore, in practical algorithms the termination criterion is designed to test if the above condition is satisfied approximately, so that the resulting output of the algorithm is an approximately optimal solution. A natural termination criterion for the steepest descent could be krf(x k )kapple, where > is a pre-specified tolerance. However, depending on the scaling of the function, this requirement can be either unnecessarily stringent, or too loose to ensure near-optimality (consider a problem concerned with minimizing distance, where the objective function can be expressed in inches, feet, or miles). Another alternative, that might alleviate the above consideration, is to terminate when krf(x k )kapple f(x k ) this, however, may lead to problems when the objective function at the optimum is zero. A combined approach is then to terminate when krf(x k )kapple ( + f(x k ) ). The value of is typically taken to be at most the square root of the machine tolerance (e.g., = 8 if 6-digit computing is used), due to the error incurred in estimating derivatives.
6 IOE 59: NLP, Winter 22 c Marina A. Epelman Stepsize selection In the analysis in the above subsection we assumed that one-dimensional optimization problem invoked in the line search in each iteration of the Steepest Descent algorithm was performed exactly and with perfect precision, which is usually not possible. In this subsection we discuss one of the many practical ways of solving this problem approximately, to determine the stepsize at each iteration of the general directional search optimization algorithm (including steepest descent) Stepsize selection basics Suppose that f(x) is a continuously di erentiable function, and that we seek to (approximately) solve: = arg min f( x + d), > where x is our current iterate, and d is the current direction generated by an algorithm that seeks to minimize f(x). We assume that d is a descent direction, i.e., rf( x) T d <. Let F ( ) =f( x + d), whereby F ( ) is a function in the scalar variable, and our problem is to solve for = arg min > F ( ). Using the chain rule for di erentiation, we can show that F ( ) =rf( x + d) T d. Therefore, applying the necessary optimality conditions to the one-dimensional optimization problem above, we want to find a value for which F ( ) =. Furthermore, since d is a descent direction, F () < Armijo rule, or backtracking Although there are iterative algorithms developed to solve the problem min F ( ) (or F ( ) = ) exactly, i.e., with a high degree of precision (such as, for instance, bisection search algorithm), they are typically too expensive computationally. (Recall that we need to perform a line search at every iteration of our steepest optimization algorithm!) On the other hand, if we sacrifice accuracy of the line search, this can cause inferior performance of the overall algorithm. The Armijo rule, or the backtracking method, is one of several inexact line search methods which guarantees a su cient degree of improvement in the objective function to ensure the algorithm s convergence. Armijo rule requires two parameters: <µ<.5 and < <. Suppose we are minimizing a function F ( ) such that F () < (which is indeed the case for the line search problems arising in descent algorithms). Then the first order approximation of F ( ) at = is given by F ()+ F (). Define ˆF ( ) =F ()+µ F () (see figure). A stepsize is considered acceptable by Armijo rule only if F ( ) apple ˆF ( ), that is, if taking a step of size guarantees su cient decrease of the function: f( x + d) f( x) apple µ rf( x) T d.
7 IOE 59: NLP, Winter 22 c Marina A. Epelman F()+µ F().2 F( )...2 F()+ F() Note that the su cient decrease condition will hold for any small value of. On the other hand, we would like to prevent the step size from being too small, for otherwise our overall optimization algorithm would not be making much progress. To combine these two considerations, will implement the following iterative backtracking procedure (here we use = 2 ): Step Set k=. =. Backtracking line search Step k If F ( k ) apple ˆF ( k ), choose k as the step size; stop. If F ( k ) > ˆF ( k ), let k+ 2 k, k k +. Note that as a result of the above iterative scheme, the chosen stepsize is = 2,wheret t the smallest integer such that F (/2 t ) apple ˆF (/2 t ) (or, for general, F ( t ) apple ˆF ( t )). Typically, µ is chosen in the range between. and.3, and between. to.8. Note that if x k and x k+ are the consecutive iterates of the general optimization algorithm with d k a descent direction, and the stepsizes chosen by backtracking, then f(x k+ ) apple f(x k ) that is, the algorithm is guaranteed to produce an improvement in the function value at every iteration. Under additional assumptions on f, it can be also shown that the steepest descent algorithm will demonstrate global convergence properties under the Armijo line search rule, as stated in the following theorem. Theorem 5.2 (Convergence Theorem; Steepest Descent with backtracking line search) Suppose that the set S = {x 2 R n : f(x) apple f(x )} is closed and bounded, and suppose that the gradient of f is Lipschitz continuous on the set S, i.e., there exist a constant G> such that krf(x) rf(y)k applegkx yk 8x, y 2 S. Suppose further that the sequence {x k } is generated by the steepest descent algorithm with stepsizes k chosen by a backtracking line search. Then every point x that is a limit point of the sequence {x k } satisfies rf( x) =. The additional assumption, basically, ensures that the gradient of f does not change too rapidly. In the proof of the theorem, this allows to provide a lower bound on the stepsize in each iteration. (See any of the reference textbooks for details.) Remark: Our discussion so far implicitly assumed that the domain of the optimization problem was the entire R n. If our optimization problem is (P) min f(x) s.t.x 2 X, is
8 IOE 59: NLP, Winter 22 c Marina A. Epelman 26 where X is an open set, then the line-search problem is min f( x + d) s.t. x + d 2 X. In this case, we must ensure that all iterate values of in the backtracking algorithm satisfy x + d 2 X. As an example, consider the following problem: P (P) min f(x) := m ln(b i a T i x) s.t. b Ax >. Here the domain of f(x) isx = {x 2 R n : b the line-search problem is: i= (LS) min h( ) :=f( x + d) P = m ln(b i s.t. b A( x + d) >. Standard arithmetic manipulation can be used to establish that Ax > }. Given a point x 2 X and a direction d, i= a T i ( x + d)) b A( x + d) > if and only if ˇ < <ˆ, where ˇ := bi min a T d< i and the line-search problem then is: a T i x a T i d P LS : minimize h( ) := m ln(b i s.t. < <ˆ. bi a T i and ˆ := min x a T d> a T d i i i= a T i ( x + d)), The implementation of the backtracking rule for this problem would have to be modified starting with =, we will backtrack, if necessary, until alpha < ˆ, and only then start checking the su cient decrease conditions. 5.4 Newton s method for minimization Again, we want to solve (P) min f(x) x 2 R n. The Newton s method can also be interpreted in the framework of the general optimization algorithm, but it truly stems from the Newton s method for solving systems of nonlinear equations. Recall that if : R n! R n, to solve the system of equations (x) =, one can apply an iterative method. Starting at a point x, approximate the function by ( x+d) ( x)+r ( x) T d,wherer ( x) T 2 R n n is the Jacobian of at x, and provided that r ( x) is nonsingular, solve the system of linear equations r ( x) T d = ( x)
9 IOE 59: NLP, Winter 22 c Marina A. Epelman 27 to obtain d. Set the next iterate x = x + d, and continue. This method is well-studied, and is wellknown for its good performance when the starting point x is chosen appropriately. The Newton s method for minimization is precisely an application of this equation-solving method to the (system of) first-order optimality conditions rf(x) =. Here is another view of the motivation behind the Newton s method for optimization. At x = x, f(x) can be approximated by f(x) q(x) 4 = f( x)+rf( x) T (x x)+ 2 (x x)t H( x)(x x), which is the quadratic Taylor expansion of f(x) at x = x. q(x) is a quadratic function which is minimized by solving rq(x) =, i.e., rf( x)+h( x)(x x) =, which yields x x = H( x) rf( x). The direction H( x) rf( x) is called the Newton direction, or the Newton step. This leads to the following algorithm for solving (P): Step Given x,setk Newton s Method: Step d k = H(x k ) rf(x k ). If d k =, then stop. Step 2 Choose stepsize k =. Step 3 Set x k+ x k + k d k, k k +. Go to Step. Proposition 5.3 If H(x) is p.d., then d = H(x) rf(x) is a descent direction. Proof: It is su cient to show that rf(x) T d = rf(x) T H(x) rf(x) <. Since H(x) is positive definite, if v 6=, < (H(x) v) T H(x)(H(x) v)=v T H(x) v, completing the proof. Note that: Work per iteration: O(n 3 ) The iterates of Newton s method are, in general, equally attracted to local minima and local maxima. Indeed, the method is just trying to solve the system of equations rf(x) =. The method assumes H(x k ) is nonsingular at each iteration. positive definite, d k is not guaranteed to be a descent direction. There is no guarantee that f(x k+ ) apple f(x k ). Moreover, unless H(x k )is Step 2 could be augmented by a linesearch of f(x k + d k ) over the value of ; then previous consideration would not be an issue. What if H(x k ) becomes increasingly singular (or not positive definite)? Use H(x k )+ I. In general, points generated by the Newton s method as it is described above, may not converge. For example, H(x k ) may not exist. Even if H(x) is always non-singular, the method may not converge, unless started close enough to the right point.
10 IOE 59: NLP, Winter 22 c Marina A. Epelman 28 Example : Let f(x) =7x ln(x). Then rf(x) =f (x) =7 x and H(x) =f (x) =. It is not x 2 hard to check that x? = 7 = is the unique global minimizer. The Newton direction at x is d = H(x) rf(x) = f (x) f (x) = x2 7 = x 7x 2, x and is defined so long as x>. So, Newton s method will generate the sequence of iterates {x k } with x k+ = x k +(x k 7(x k ) 2 )=2x k 7(x k ) 2. Below are some examples of the sequences generated by this method for di erent starting points: k x k x k x k (note that the iterate in the first column is not in the domain of the objective function, so the algorithm has to terminate with an error). Below is a plot of the progress of the algorithm as a function of iteration number (for the two sequences that did converge): Example 2: f(x) = ln( x x 2 ) ln x ln x 2. rf(x) = apple x x 2 x, x x 2 x 2
11 IOE 59: NLP, Winter 22 c Marina A. Epelman H(x) = 4 x x 2 + x x x x x 2 x x 2 + x 2 x? = 3, 3, f(x? )= k x k x k 2 kx k xk e e e 6 Termination criteria Since Newton s method is working with the Hessian as well as the gradient, it would be natural to augment the termination criterion we used in the Steepest Descent algorithm with the requirement that H(x k ) is positive semi-definite, or, taking into account the potential for the computational errors, that H(x k )+ I is positive semi-definite for some > (this parameter may be di erent than the one used in the condition on the gradient).
12 IOE 59: NLP, Winter 22 c Marina A. Epelman Comparing performance of the steepest descent and Newton algorithms 5.5. Rate of convergence Suppose we have a converging sequence lim k! s k = s, and we would like to characterize the speed, or rate, at which the iterates s k approach the limit s. A converging sequence of numbers {s k } exhibits linear convergence if for some apple C<, s k+ s lim = C. k! s k s C in the above expression is referred to as the rate constant; if C =, the sequence exhibits superlinear convergence. A converging sequence of numbers {s k } exhibits quadratic convergence if s k+ s lim k! s k s 2 = <. Examples: Linear convergence s k = k :.,.,., etc. s =. s k+ s s k s =.. Superlinear convergence s k =. k! :, 2, 6, 24, 25,etc. s =. s k+ s s k s = k! (k + )! =! as k!. k + Quadratic convergence s k = (2 k ) :.,.,.,., etc. s =. s k+ s s k s 2 = (2k ) 2 =. 2k This illustration compares the rates of convergence of the above sequences (note that the y-axis is displayed on the logarithmic scale):
13 IOE 59: NLP, Winter 22 c Marina A. Epelman 3 We will use the notion of rate of convergence to analyze one aspect of performance of optimization algorithms. Indeed, since an algorithm for nonlinear optimization problems, in its abstract form, generates an infinite sequence of points {x k } converging to a solution x only in the limit, it makes sense to discuss the rate of convergence of the sequence ke k k = kx k xk, or E k = f(x k ) f( x), which both have limit Rate of convergence of the steepest descent algorithm for the case of a quadratic function In this section we explore answers to the question of how fast the steepest descent algorithm converges. Recall that in the earlier example we observed linear convergence of both the sequence {E k } and {e k }. We will show now that the steepest descent algorithm with stepsizes selected by exact line search in general exhibits linear convergence, but that the rate constant depends very much on the ratio of the largest to the smallest eigenvalue of the Hessian matrix H(x) at the optimal solution x = x?. In order to see how this dependence arises, we will examine the case where the objective function f(x) is itself a simple quadratic function of the form: f(x) = 2 xt Qx + q T x, where Q is a positive definite symmetric matrix. We will suppose that the eigenvalues of Q are A = a a 2... a n = a>, i.e, A and a are the largest and smallest eigenvalues of Q. We already derived that the optimal solution of (P) is x? = Q q with the optimal objective function value is: f(x? )= 2 qt Q q.
14 IOE 59: NLP, Winter 22 c Marina A. Epelman 32 Moreover, if x is the current point in the steepest descent algorithm, then f(x) = 2 xt Qx + q T x, and the next iterate of the steepest descent algorithm with exact line search is x = x + d = x + dt d d T Qd d, where d = rf(x) and f(x )=f(x) d T d d T Qd = f(x) (d T d) 2 2 d T Qd. Therefore, (d T d) 2 2 f(x ) f(x? ) f(x) f(x? ) = f(x) f(x? ) d T Qd f(x) f(x? ) = = = (d T d) 2 2 d T Qd 2 xt Qx + q T x + 2 qt Q q (d T d) 2 2 d T Qd 2 (Qx + q)t Q (Qx + q) (d T d) 2 (d T Qd)(d T Q d) = where = (dt Qd)(d T Q d) (d T d) 2. In order for the convergence constant to be good, which will translate to fast linear convergence, we would like the quantity to be small. The following result provides an upper bound on the value of. Kantorovich Inequality: Let A and a be the largest and the smallest eigenvalues of Q, respectively. Then (A + a)2 apple. 4Aa We will skip the proof of this inequality. Continuing, we have Let us apply this inequality to the above analysis. f(x ) f(x? ) f(x) f(x? ) = 4Aa (A a)2 apple = (A + a) 2 (A + a) 2 = A/a 2. A/a + Note by definition that A/a is always at least. If A/a is small (not much bigger than ), then the convergence constant will be much smaller than. However, if A/a is large, then the convergence constant will be only slightly smaller than. The following table shows some sample values:
15 IOE 59: NLP, Winter 22 c Marina A. Epelman 33 Upper Bound on Number of Iterations to Reduce A a the Optimality Gap by Note that the number of iterations needed to reduce the optimality gap by. grows linearly in the ratio A/a. Two pictures of possible iterations of the steepest descent algorithm are as follows:
16 IOE 59: NLP, Winter 22 c Marina A. Epelman 34 Some remarks: We analyzed the convergence of the function values; the convergence of the algorithm iterates can be easily shown to be linear with the same rate constant. The bound on the rate of convergence is attained in practice quite often, which is unfortunate. The ratio of the largest to the smallest eigenvalue of a matrix is called the condition number of the matrix. What about non-quadratic functions? If the Hessian at the locally optimal solution is positive definite, the function behaves as near-quadratic function in a neighborhood of that solution. The convergence exhibited by the iterates of the steepest descent algorithm will also be linear. The analysis of the non-quadratic case gets very involved; fortunately, the key intuition is obtained by analyzing the quadratic case. What about backtracking line search? Also linear convergence! (The rate constant depends in part on the backtracking parameters.) Rate of convergence of the pure Newton s method We have seen from our examples that, even for convex functions, the Newton s method in its pure form (i.e., with stepsize of at every iteration) does not guarantee descent at each iteration, and may produce a diverging sequence of iterates. Moreover, each iteration of the Newton s method is much more computationally intensive then that of the steepest descent. However, under certain conditions, the method exhibits quadratic rate of convergence, making it the ideal method for solving convex optimization problems. Recall that a method exhibits quadratic convergence when ke k k = kx k xk! and ke k+ k lim k! ke k k 2 = C. Roughly speaking, if the iterates converge quadratically, the accuracy (i.e., the number of correct digits) of the solution doubles in a fixed number of iterations. There are many ways to state and prove results regarding the convergence on the Newton s method. We provide one that provides a particular insight into the circumstances under which pure Newton s method demonstrates quadratic convergence. Let kvk denote the usual Euclidian norm of a vector, namely kvk := p v T v. Recall that the operator norm of a matrix M is defined as follows: kmk := max{kmxk : kxk =}. x As a consequence of this definition, for any x, kmxkapplekmk kxk. Theorem 5.4 (Quadratic convergence) Suppose f(x) is twice continuously di erentiable and x? is a point for which rf(x? )=.SupposeH(x) satisfies the following conditions: there exists a scalar h> for which k[h(x? )] kapple h there exists scalars > and L> for which kh(x) H(y)k applelkx yk for all x and y satisfying kx x? kapple and ky x? kapple.
17 IOE 59: NLP, Winter 22 c Marina A. Epelman 35 Let x satisfy kx x? kapple, where < < and := min, 2h 3L, and let x N := x H(x) rf(x). Then: (i) kx N x? kapplekx x? k 2 L 2(h Lkx x? k) (ii) kx N x? k < kx x? k, and hence the iterates converge to x? (iii) kx N x? kapplekx x? k 2 3L 2h. The proof relies on the following two elementary facts. Proposition 5.5 Suppose that M is a symmetric matrix. Then the following are equivalent:. h> satisfies km kapple h 2. h> satisfies kmvk h kvk for any vector v Proposition 5.6 Suppose that f(x) is twice di erentiable. Then rf(z) rf(x) = Z [H(x + t(z x))] (z x)dt. Proof: Let (t) := rf(x + t(z x)). Then () = rf(x) and () = rf(z), and (t) = [H(x + t(z x))] (z x). From the fundamental theorem of calculus, we have: Proof of Theorem 5.4 We have: Therefore rf(z) rf(x) = () () x N x? = x H(x) rf(x) x? = = Z Z (t)dt = x x? + H(x) (rf(x? ) rf(x)) [H(x + t(z x))] (z x)dt. Z = x x? + H(x) [H(x + t(x? x))] (x? x)dt (from Proposition 5.6) Z = H(x) [H(x + t(x? x)) H(x)] (x? x)dt. kx N x? kapplekh(x) k applekx? Z xk kh(x) k = kx? xk 2 kh(x) kl k [H(x + t(x? x)) H(x)] k k(x? x)kdt Z = kx? xk 2 kh(x) kl. 2 Z L t k(x? tdt x)kdt
18 IOE 59: NLP, Winter 22 c Marina A. Epelman 36 We now bound kh(x) k. Let v be any vector. Then kh(x)vk = kh(x? )v +(H(x) H(x? ))vk kh(x? )vk k(h(x) H(x? ))vk h kvk kh(x) H(x? )kkvk (from Proposition 5.5) h kvk Lkx? xk kvk =(h Lkx? xk) kvk. Invoking Proposition 5.5 again, we see that this implies that Combining this with the above yields kh(x) kapple h Lkx? xk. kx N x? kapplekx? xk 2 L 2(h Lkx? xk), which is (i) of the theorem. Because Lkx? xkapple 2h 3 < 2h 3 we have: kx N x? kapplekx? Lkx? xk xk 2(h Lkx? xk) apple 2h 3 2h 2 h 3 which establishes (ii) of the theorem. Finally, we have kx? xk = kx? xk, kx N x? kapplekx? xk 2 L 2(h Lkx? xk) applekx? xk 2 L 2 h 2h 3 = kx? xk 2 3L 2h, which establishes (iii) of the theorem. Notice that the results regarding the convergence and rate of convergence in the above theorem are local, i.e., they apply only if the algorithm is initialized at certain starting points (the ones su ciently close to the desired limit). In practice, it is not known how to pick such starting points, or to check if the proposed starting point is adequate. (With the very important exception of self-concordant functions.) 5.6 Further discussion and modifications of the Newton s method 5.6. Global convergence for strongly convex functions with a two-phase Newton s method We have noted that, to ensure descent at each iteration, the Newton s method can be augmented by a line search. This idea can be formalized, and the e ciency of the resulting algorithm can be analyzed (see, for example, Convex Optimization by Stephen Boyd and Lieven Vandenberghe, available at for a fairly simple presentation of the analysis). Suppose that f(x) isstrongly convex on it domain, i.e., assume there exists µ > such that the smallest eigenvalue of H(x) is greater than equal to µ for all x and that the Hessian is Lipschitz continuous everywhere on the domain of f. Suppose we apply the Newton s method with the
19 IOE 59: NLP, Winter 22 c Marina A. Epelman 37 stepsize at each iteration determined by the backtracking procedure of section That is, at each iteration of the algorithm we first attempt to take a full Newton step, but reduce the stepsize if the decrease in the function value is not su cient. Then there exist positive numbers and such that if krf(x k )k, thenf(x k+ ) f(x k ) apple, and if krf(x k )k <,thenstepsize k = will be selected, and the next iterate will satisfy krf(x k+ )k <, and so will all the further iterates. Moreover, quadratic convergence will be observed in this phase. As hinted above, the algorithm will proceed in two phases: while the iterates are far from the minimizer, a dampening of the Newton step will be required, but there will be a guaranteed decrease in the objective function values. This phase (referred to as dampened Newton phase ) cannot take more than f(x ) f(x?) iterations. Once the norm of the gradient becomes su ciently small, no dampening of the Newton step will required in the rest of the algorithm, and quadratic convergence will be observed, thus making it the quadratically convergence phase. Note that it is not necessary to know the values of and to apply this version of the algorithm! The two-phase Newton s method is globally convergent; however, to ensure global convergence, the function being minimized needs to posses particularly nice global properties Other modifications of the Newton s method We have seen that if Newton s method is initialized su ciently close to the point x such that rf( x) = and H( x) is positive definite (i.e., x is a local minimizer), then it will converge quadratically, using stepsizes of =. There are three issues in the above statement that we should be concerned with: What if H( x) is singular, or nearly-singular? How do we know if we are close enough, and what to do if we are not? Can we modify Newton s method to guarantee global convergence? In the previous subsection we assumed away the first issue, and, under an additional assumption, showed how to address the other two. What if the function f is not strongly convex, and H(x) may approach singularity? There are two popular approaches (which are actually closely related) to address these issues. The first approach ensures that the method always uses a descent direction. For example, instead of the direction H(x k ) rf(x k ), use the direction (H(x k )+ k I) rf(x k ), where k is chosen so that the smallest eigenvalue of H(x k )+ k I is bounded below by a fixed number >. It is important to choose the value of appropriately if it is chosen to be too small, the matrix employed in computing the direction can become ill-conditioned if H( x) is nearly singular; if it is chosen to be too large, the direction becomes nearly that of the steepest descent algorithm, and hence only linear convergence can be guaranteed. Hence, the value of k is often chosen dynamically. The second approach is the so-called trust region method. Note that the main idea behind the Newton s method is to represent the function f(x) by its quadratic approximation q k (x) =f(x k )+
20 IOE 59: NLP, Winter 22 c Marina A. Epelman 38 rf(x k ) T (x x k )+ 2 (x xk ) T H(x k )(x x k ) around the current iterate, and then minimize that approximation. While locally the approximation works quite well, this may no longer be the case when a large step is taken. The trust region methods hence find the next iterate by solving the following constrained optimization problem: min q k (x) s.t.kx x k kapple k, i.e., not allowing the next iterate to be outside the neighborhood of x k where the quadratic approximation is close to the original function f(x) (as it turns out, this problem is not much harder to solve than the unconstrained minimization of q k (s)). The value of k is set to represent the size of the region in which we can trust q k (x) toprovide a good approximation of f(x). Smaller values of k ensure that we are working with an accurate representation of f(x), but result in conservative steps. Larger values of k allow for larger steps, but may lead to inaccurate estimation of the objective function. To account for this, the value if k is updated dynamically throughout the algorithm, namely, it is increased if it is observed that q k (x) provided an exceptionally good approximation of f(x) at the previous iteration, and decreased is the approximation was exceptionally bad.
The Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More information8 Barrier Methods for Constrained Optimization
IOE 519: NL, Winter 2012 c Marina A. Epelman 55 8 Barrier Methods for Constrained Optimization In this subsection, we will restrict our attention to instances of constrained problem () that have inequality
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More information4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:
Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationNotes on Constrained Optimization
Notes on Constrained Optimization Wes Cowan Department of Mathematics, Rutgers University 110 Frelinghuysen Rd., Piscataway, NJ 08854 December 16, 2016 1 Introduction In the previous set of notes, we considered
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationSYSTEMS OF NONLINEAR EQUATIONS
SYSTEMS OF NONLINEAR EQUATIONS Widely used in the mathematical modeling of real world phenomena. We introduce some numerical methods for their solution. For better intuition, we examine systems of two
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More information1-D Optimization. Lab 16. Overview of Line Search Algorithms. Derivative versus Derivative-Free Methods
Lab 16 1-D Optimization Lab Objective: Many high-dimensional optimization algorithms rely on onedimensional optimization methods. In this lab, we implement four line search algorithms for optimizing scalar-valued
More informationIOE 511/Math 652: Continuous Optimization Methods, Section 1
IOE 511/Math 652: Continuous Optimization Methods, Section 1 Marina A. Epelman Fall 2007 These notes can be freely reproduced for any non-commercial purpose; please acknowledge the author if you do so.
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationUnconstrained optimization I Gradient-type methods
Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa www.di.unipi.it/~frangio frangio@di.unipi.it Computational Mathematics for Learning
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More information4 damped (modified) Newton methods
4 damped (modified) Newton methods 4.1 damped Newton method Exercise 4.1 Determine with the damped Newton method the unique real zero x of the real valued function of one variable f(x) = x 3 +x 2 using
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationMVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg
MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material
More informationLecture 3: Linesearch methods (continued). Steepest descent methods
Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationCHAPTER 2: QUADRATIC PROGRAMMING
CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,
More informationCalculus Example Exam Solutions
Calculus Example Exam Solutions. Limits (8 points, 6 each) Evaluate the following limits: p x 2 (a) lim x 4 We compute as follows: lim p x 2 x 4 p p x 2 x +2 x 4 p x +2 x 4 (x 4)( p x + 2) p x +2 = p 4+2
More informationComputational Finance
Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples
More informationPrimal-Dual Interior-Point Methods for Linear Programming based on Newton s Method
Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationNumerical Optimization
Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationNonlinear Optimization
Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationMultidisciplinary System Design Optimization (MSDO)
Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationECE580 Partial Solution to Problem Set 3
ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationTrajectory-based optimization
Trajectory-based optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 6 1 / 13 Using
More informationPenalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationSelf-Concordant Barrier Functions for Convex Optimization
Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationExamination paper for TMA4180 Optimization I
Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More information14. Nonlinear equations
L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational
More informationIntroduction to Optimization
Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine
More informationNonlinear Programming (NLP)
Natalia Lazzati Mathematics for Economics (Part I) Note 6: Nonlinear Programming - Unconstrained Optimization Note 6 is based on de la Fuente (2000, Ch. 7), Madden (1986, Ch. 3 and 5) and Simon and Blume
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationLine Search Algorithms
Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you
More informationNumerical Methods. King Saud University
Numerical Methods King Saud University Aims In this lecture, we will... Introduce the topic of numerical methods Consider the Error analysis and sources of errors Introduction A numerical method which
More informationBarrier Method. Javier Peña Convex Optimization /36-725
Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly
More informationInterior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems
AMSC 607 / CMSC 764 Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 4: Introduction to Interior Point Methods Dianne P. O Leary c 2008 Interior Point Methods We ll discuss
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationGradient Descent Methods
Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationCE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions
CE 191: Civil and Environmental Engineering Systems Analysis LEC : Optimality Conditions Professor Scott Moura Civil & Environmental Engineering University of California, Berkeley Fall 214 Prof. Moura
More informationPart 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)
Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationOptimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29
Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29 You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationCS 450 Numerical Analysis. Chapter 5: Nonlinear Equations
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationHYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract
HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More information