Lecture 10: September 26

Size: px
Start display at page:

Download "Lecture 10: September 26"

Transcription

1 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. This lecture s notes illustrate some uses of various L A TEX macros. Take a look at this and imitate. 0. Motivation From previous lectures, we know that steepest decent method is very slow. The Newton method is fast. But the computation of inverse of the Hessian matrix is very expensive. Can we find a method between these two? Conjugate direction method can be regarded as being between the method of steepest decent and Newton s method. 0.2 Conjugate Direction Methods The goal includes two folds. ()Accelarate the convergence rate of steepest decent. (2)Avoiding the high computational cost Newton s method 0.2. Definition[Q-conjugate directions] Let Q be a symmetric matrix d, d 2,..., d k vectors (d i R n, d i 0) are Q-orthogonal (conjugate) w.e.r Q, if d T i Qd j = 0, i j note In the application we consider, the matrix Q will be positive definite. But this is not inherent in the basc definition. If Q=0, any two vectors are conjugate. If Q=I, conjugacy is equivalent to the usual notion of orthogonality Linear Independency Lemma lemma Let Q be positive definite. If d, d 2,..., d k vectors are Q-conjugate, then they are linearl independent. 0-

2 0-2 Lecture 0: September 26 proof by contradiction If d k = α d α k, then 0 < = d T k Q(α d α k d k ) = α d T k Qd +...α k = The importance of Q-conjugancy For quadratic problem, our goal is to solve arg min x R n 2 xt Qx b T x Assume Q is positive definite The first order differiential of the objective function equals to 0 for the minimizer. So the unique solution to this problem is the solution to Qx = b, x R n Let x denote the solution. Let d 0, d,..., d n vectors be Q-conjugate. Since d 0, d,..., d n vectors are independent, x = α 0 d α n d n Therefore, d T i Qx = d T i Q(α 0d α n d n ) = α i d T i Qd i we don t need to know x to get α i α i = dt i Qx d T i Qd i d x = α i d i = i=0 = dt i b d T i Qd i d i=0 d T i b d T i Qd i We can see that there is no need to do matrix inversion. We only need to calculate inner product. d i Conjugate Direction Theorem The expansion for x can be considered to be the result of an iterative process of n steps where at the ith steps α i d i is added. This can be generalized further such a way that the starting point of the iteration can be arbitery x 0 We introduce the conjugate direction theorem here. Theorem[Conjugate Direction Theorem] Let d 0, d,..., d n vectors be Q-conjugate. x 0 R n be an arbitery starting point. x k+ = x k + α k d k

3 Lecture 0: September g k = Qx k b Then after n steps, x n = x α k = gt k d k = (Qx k b) T d k Proof Since {d 0, d,..., d n } vectors are independent. x x 0 = α 0 d α n d n for some α 0,..., α n Using the x k+ = x k + α k d k update rules, we have x = x 0 + α 0 d 0 x 2 = x 0 + α 0 d 0 + α x k = x 0 + α 0 d 0 + α +...α k d k x n = x 0 + α 0 d 0 + α d α n d n = x span Therefore, it s enough to prove that with these α k values we have α k = gt k d k We already know x x 0 = α 0 d α n d n x k x 0 = α 0 d α k d k span Therefore, d T k Q(x x 0 ) = d T k Q(α 0 d α n d n = α k α k = dt k Q(x x 0 ) d T k Q(x k x 0 ) = d T k Q(α 0 d 0 + α d α k d k ) = 0 d k Q(x x 0 ) = d T k Q(x x k + x k x 0 ) = d T k Q(x x k ) α k = dt k Q(x x 0 ) = dt k Q(x x k ) = dt k g k

4 0-4 Lecture 0: September 26 Another motivation for Q-conjugacy Our goal is: arg min x R n 2 xt Qx b T x x x 0 = α 0 d α n d n for some {α i } n i=0 R Therefore, f(x) = n 2 [x 0+ j=0 n α j d j ] T Q[x 0 + j=0 n α j d j ] b T [x 0 + j=0 n α j d j ]f(x) = c+ j=0 2 [x 0+α j d j ] T Q[x 0 +α j d j ] b T [x 0 +α j d j ] The optimization problem is transformed into a n separate -dimensional optimization problems Expanding Subspace Theorem Let B k = span(d 0,..., d k ) R n We will show as the method of conjugate directions progresses each x k minimizes the objective f(x) = 2 xt Qx b T x both over x 0 + B k and x k + αd k, α R.That is x k = arg min x=x k +αd k,α R 2 xt Qx b T xx k = arg min x=x 0+B k 2 xt Qx b T x Theorem[Expanding Subspace Theorem] Let {d i } n i=0 be a sequence of Q-conjugate vectors in Rn, x 0 R n arbitary x k+ = x k + α k d k x k = arg α k = gt k d k min x=x k +αd k,α R 2 xt Qx b T x x k = arg min x=x 0+B k 2 xt Qx b T x Proof It s enough to show that x k minimizes f on x = x 0 + B k since it contains the line:x = x k + αd k (by the definition of B k ) Since f is strictly convex, it s enough to show that g k = f (x k ) is orthogonal to B k We prove g k B k by induction k=0: B 0 is empty set. Assume that g k B k and prove that g k+ B k+ By definition, x k+ = x k + α k d k g k = Qx k b Therefore,

5 Lecture 0: September g k+ = Qx k+ b = Q(x k + α k d k ) b = (Qx k b) + α k Qd k span = g k + α k Qd k First, let us prove that g k+ d k We have proved g k+ = g k + α k Qd k By definition, α k = gt k d k Therefore, d T k g k+ = d T k (g k + α k Qd k ) = d T k d T k g k = 0 g k+ d k Now let us prove that g k+ d i, i < k Since g k+ = g k + α k Qd k g k B k = span(d 0,..., d k ) Therefore, d T i g k+ = d T i (g k + α k Qd k ) = d T i g k + α k d T i Qd k = 0 g k+ d i, i < k So far we have proved that g k+ B k+, where B k+ = span(d 0,..., d k ) Corollary of Exponential subspace theorem corollary g k B k = span(d 0,..., d k ) g T k d i = 0, 0 i k = B 0... B k B n = R n span since x k minimizes f over x 0 + B k. x n is the minimum of f in R n

6 0-6 Lecture 0: September Conjugate gradient method The conjugate gradient method isa conjugate direction method Selects the successive direction vectors as a conjugate version of the successive gradients obtained as the method progress The conjugate directions are not specified beforehandm but rather are determined sequentially at each step of the iteration 0.3. Description and advantage So far given d 0,..., d n, we already have an update rule for α k. α k = gt K d k d T K Qd k The problem is how should we choose vectors d 0,..., d n. The conjugate gradient method is the conjugate direction method that is obtained by selecting the successive direction vectors as a conjugate version of the successive gradients obtained as the method progresses. Thus, the directions are not specified beforehand, but rather are determined sequentially at each step of the iteration. There are three primary advantages to this method of direction selection.. Unless the solution is attained in less than n steps, the gradient is always nonzero and linearly independent of all previous direction vectors. 2. A more important advantage of the conjugate gradient method is the especially simple formula that is used to determine the new direction vector. This simplicity makes the method only slightly more complicated than steepest descent. 3. Because the directions are based on the gradients, the process makes good uniform progress toward the solution at every step. This is in contrast to the situation for arbitrary sequences of conjugate directions in which progress may be slight until the final few steps Algorithm Let x 0 R n be arbitrary. d 0 = g 0 = b Qx 0 α k = gt d k x k+ = x k + α k d k g k = Qx k b d k+ = g k+ + β k d k β k = gt k+ Qd k

7 Lecture 0: September In the algorithm the first step is identical to a steepest descent step; each succeeding step moves in a direction that is a linear combination of the current gradient and the preceding direction vector. The attractive feature of the algorithm is the simple formula for updating in the direction vector. The method is only slightly more complicated to implement than the method of steepest descent but converges in a finite number of steps. Theorem 0. The conjugate gradient algorithm is a conjugate direction method. If it does not terminate at x k, then a) span(g 0, g,..., g k ) = span(g 0, Qg 0,..., Q k g 0 ) b) span(d 0, d,..., d k ) = span(g 0, Qg 0,..., Q k g 0 ) c) d T k Qd i = 0, i < k d) α k = gt k g k e) β k = gt k+ g k+ g T k g k Proof: We first prove a), b) and c) simultaneously by induction. Clearly, they are true for k = 0. Now suppose they are true for k, we show that they are true for k +. We have g k+ = g k + α k Qd k By the induction hypothesis both g k and Qd k belong to span(g 0, Qg 0,..., Q k+ g 0 ), the first by a) and the second by b). Thus g k+ span(g 0, Qg 0,..., Q k+ g 0 ). Furthermore g k+ / span(d 0, d,..., d k ) = span(g 0, Qg 0,..., Q k g 0 ) since otherwise g k+ = 0, because for any conjugate direction method g k+ is orthogonal to span(d 0, d,..., d k )]. (The induction hypothesis on c) guarantees that the method is a conjugate direction method up to x k+.) Thus, finally we conclude that which proves a). To prove b) we write span(g 0, g,..., g k ) = span(g 0, Qg 0,..., Q k g 0 ) d k+ = g k+ + β k d k, and b) immediately follows form a) and the induction hypothesis on b). Next, to prove c we have d T k+qd i = g T k+qd i + β k d T k Qd i. For i = k the right side is zero by definition of β k. For i < k both terms vanish. The first term vanishes since Qd i span(d 0, d,..., d k+ ), the induction hypothesis which guarantees the method is a conjugate direction method up to x k+, and by the Expanding Subspace Theorem that guarantees that g k+ is orthogonal to span(d 0, d,..., d k+ ). The second term vanishes by the induction hypothesis on c). This proves c), which also proves that the method is a conjugate direction method. To prove d) we have g T k d k = g T k g k β k g T k d k, and the second term is zero by the Expanding Subspace Theorem.

8 0-8 Lecture 0: September 26 Finally, to prove e) we note that g T k+ g k = 0, because g k span(d 0, d,..., d k ) and g k+ span(d 0, d,..., d k ). Thus since Qd k = α k (g k+ g k ). We have g T k+qd k = α k g T k+g k Extension to non-quadratic problem 0.4. Description Our goal is to min f(x) x R n We will make quadratic approximation g k = f(x k ) and Q = 2 f(x k ). using these associations, reevaluated at each step, all quantities necessary to implement the basic conjugate gradient algorithm can be evaluated. If f is quadratic, these associations are identities, so that the general algorithm obtained by using them is a generalization of the conjugate gradient scheme. This is similar to the philosophy underlying Newtons method where at each step the solution of a general problem is approximated by the solution of a purely quadratic problem through these same associations. When applied to nonquadratic problems, conjugate gradient methods will not usually terminate within n steps. It is possible therefore simply to continue finding new directions according to the algorithm and terminate only when some termination criterion is met. Therefore after n steps, we can restart the process from this point and run the algorithm for another n steps Algorithm. Starting at x 0 and compute g 0 = f(x 0 ) and set d 0 = g For k = 0,,..., n : a) Set x k+ = α k d k where α k = b) Compute g k+ = f(x k+ ) g T k d k d T k [ 2 f(x k )]d k c) Unless k = n, set d k+ = g k+ + β k d k where β k = gt k+ [ f(x k)]d k d T k [ 2 f(x k )]d k 3. Replace x 0 by x n and go back to step. Until converge Properties of CGA An attractive feature of the algorithm is that, just as in the pure form of Newtons method, no line searching is required at any stage. Also, the algorithm converges in a finite number of steps for a quadratic problem. The undesirable features are that F (x k ) must be evaluated at each point, which is often impractical, and that the algorithm is not, in this form, globally convergent.

9 Lecture 0: September Line search method Two of the line search method are FletcherReeves method and Ploak-Ribiere method. This algorithm is quite similar to the Conjugate gradient method. The slight differences are in step 2 a) and 2 c) above Algorithm for Fletcher Reeves method. Starting at x 0 and compute g 0 = f(x 0 ) and set d 0 = g For k = 0,,..., n : a) Set x k+ = α k d k where α k = arg min α f(x k + αd k ) b) Compute g k+ = f(x k+ ) c) Unless k = n, set d k+ = g k+ + β k d k where β k = gt k+ g k+ g T k g k 3. Replace x 0 by x n and go back to step. Until converge. Firstly Fletcher-Reeves method is line search method when evaluating α and there is no close form solution normally. Secondly the Hessian is not used in the algorithm. Also in the quadratic case it is identical to the original conjugate direction algorithm Algorithm for Polak Ribiere method Polak-Ribiere method is quite similar to Fletcher-Reeves method. The only difference occurs when evaluating β k.. Starting at x 0 and compute g 0 = f(x 0 ) and set d 0 = g For k = 0,,..., n : a) Set x k+ = α k d k where α k = arg min α f(x k + αd k ) b) Compute g k+ = f(x k+ ) c) Unless k = n, set d k+ = g k+ + β k d k where β k = (g k+ g k ) T g k+ g T k g k 3. Replace x 0 by x n and go back to step. Until converge. Again this leads to a value identical to the standard formula in the quadratic case. Experimental evidence seems to favor the PolakRibiere method over other methods of this general type.

10 0-0 Lecture 0: September Convergence Global convergence of the line search methods is established by noting that a pure steepest descent step is taken every n steps and serves as a spacer step. Since the other steps do not increase the objective, and in fact hopefully they decrease it, global convergence is assured. Thus the restarting aspect of the algorithm is important for global convergence analysis, since in general one cannot guarantee that the directions d k generated by the method are descent directions. The local convergence properties of both of the above, and most other, nonquadratic extensions of the conjugate gradient method can be inferred from the quadratic analysis. Since one complete cycle solves a quadratic problem exactly just as Newtons method does in one step, we expect that for general nonquadratic problems there will hold x k+n x c x k x 2 for some c and k = 0, n, 2n, 3n,.... This can indeed be proved, and of course underlies the original motivation for the method. For problems with large n, however, a result of this type is in itself of little comfort, since we probably hope to terminate in fewer than n steps. 0.7 summary Conjugate Direction Methods. - conjugate directions Minimizing quadratic functions Conjugate Gradient Methods for nonquadratic functions - Line search methods. FletcherReeves method 2. PolakRibiere method References [CW87] Luenberger, David G and Ye, Yinyu, Linear and nonlinear programming, Springer, 2008, vol. 6.

The Conjugate Gradient Algorithm

The Conjugate Gradient Algorithm Optimization over a Subspace Conjugate Direction Methods Conjugate Gradient Algorithm Non-Quadratic Conjugate Gradient Algorithm Optimization over a Subspace Consider the problem min f (x) subject to x

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method

More information

FALL 2018 MATH 4211/6211 Optimization Homework 4

FALL 2018 MATH 4211/6211 Optimization Homework 4 FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720 Steepest Descent Juan C. Meza Lawrence Berkeley National Laboratory Berkeley, California 94720 Abstract The steepest descent method has a rich history and is one of the simplest and best known methods

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Lecture 15: October 15

Lecture 15: October 15 10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Lecture 6: September 17

Lecture 6: September 17 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Lecture 11: October 2

Lecture 11: October 2 10-725: Optimization Fall 2012 Lecture 11: October 2 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Tongbo Huang, Shoou-I Yu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

An Iterative Descent Method

An Iterative Descent Method Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)

More information

Lecture 17: Primal-dual interior-point methods part II

Lecture 17: Primal-dual interior-point methods part II 10-725/36-725: Convex Optimization Spring 2015 Lecture 17: Primal-dual interior-point methods part II Lecturer: Javier Pena Scribes: Pinchao Zhang, Wei Ma Note: LaTeX template courtesy of UC Berkeley EECS

More information

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

Lecture 24: August 28

Lecture 24: August 28 10-725: Optimization Fall 2012 Lecture 24: August 28 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Jiaji Zhou,Tinghui Zhou,Kawa Cheung Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

x k+1 = x k + α k p k (13.1)

x k+1 = x k + α k p k (13.1) 13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc. ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, 2015 1 Name: Solution Score: /100 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully.

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Lecture 14: Newton s Method

Lecture 14: Newton s Method 10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Lecture 14: October 11

Lecture 14: October 11 10-725: Optimization Fall 2012 Lecture 14: October 11 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Zitao Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Conjugate gradient algorithm for training neural networks

Conjugate gradient algorithm for training neural networks . Introduction Recall that in the steepest-descent neural network training algorithm, consecutive line-search directions are orthogonal, such that, where, gwt [ ( + ) ] denotes E[ w( t + ) ], the gradient

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples Conjugate gradient training algorithm Steepest descent algorithm So far: Heuristic improvements to gradient descent (momentum Steepest descent training algorithm Can we do better? Definitions: weight vector

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

Lecture 5: September Time Complexity Analysis of Local Alignment

Lecture 5: September Time Complexity Analysis of Local Alignment CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent

More information

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study International Journal of Mathematics And Its Applications Vol.2 No.4 (2014), pp.47-56. ISSN: 2347-1557(online) Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms:

More information

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes

More information

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor. ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

ECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets

ECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets ECE580 Exam 2 November 01, 2011 1 Name: Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc. I do

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

Conjugate Directions for Stochastic Gradient Descent

Conjugate Directions for Stochastic Gradient Descent Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Lec10p1, ORF363/COS323

Lec10p1, ORF363/COS323 Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

ISyE 6663 (Winter 2017) Nonlinear Optimization

ISyE 6663 (Winter 2017) Nonlinear Optimization Winter 017 TABLE OF CONTENTS ISyE 6663 (Winter 017) Nonlinear Optimization Prof. R. D. C. Monteiro Georgia Institute of Technology L A TEXer: W. KONG http://wwkong.github.io Last Revision: May 3, 017 Table

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Sep 29 DUE: Oct 13, 5:00

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

Lecture 23: Conditional Gradient Method

Lecture 23: Conditional Gradient Method 10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions

More information

and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s

and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s Global Convergence of the Method of Shortest Residuals Yu-hong Dai and Ya-xiang Yuan State Key Laboratory of Scientic and Engineering Computing, Institute of Computational Mathematics and Scientic/Engineering

More information