Lecture 10: September 26
|
|
- Fay Hunter
- 6 years ago
- Views:
Transcription
1 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. This lecture s notes illustrate some uses of various L A TEX macros. Take a look at this and imitate. 0. Motivation From previous lectures, we know that steepest decent method is very slow. The Newton method is fast. But the computation of inverse of the Hessian matrix is very expensive. Can we find a method between these two? Conjugate direction method can be regarded as being between the method of steepest decent and Newton s method. 0.2 Conjugate Direction Methods The goal includes two folds. ()Accelarate the convergence rate of steepest decent. (2)Avoiding the high computational cost Newton s method 0.2. Definition[Q-conjugate directions] Let Q be a symmetric matrix d, d 2,..., d k vectors (d i R n, d i 0) are Q-orthogonal (conjugate) w.e.r Q, if d T i Qd j = 0, i j note In the application we consider, the matrix Q will be positive definite. But this is not inherent in the basc definition. If Q=0, any two vectors are conjugate. If Q=I, conjugacy is equivalent to the usual notion of orthogonality Linear Independency Lemma lemma Let Q be positive definite. If d, d 2,..., d k vectors are Q-conjugate, then they are linearl independent. 0-
2 0-2 Lecture 0: September 26 proof by contradiction If d k = α d α k, then 0 < = d T k Q(α d α k d k ) = α d T k Qd +...α k = The importance of Q-conjugancy For quadratic problem, our goal is to solve arg min x R n 2 xt Qx b T x Assume Q is positive definite The first order differiential of the objective function equals to 0 for the minimizer. So the unique solution to this problem is the solution to Qx = b, x R n Let x denote the solution. Let d 0, d,..., d n vectors be Q-conjugate. Since d 0, d,..., d n vectors are independent, x = α 0 d α n d n Therefore, d T i Qx = d T i Q(α 0d α n d n ) = α i d T i Qd i we don t need to know x to get α i α i = dt i Qx d T i Qd i d x = α i d i = i=0 = dt i b d T i Qd i d i=0 d T i b d T i Qd i We can see that there is no need to do matrix inversion. We only need to calculate inner product. d i Conjugate Direction Theorem The expansion for x can be considered to be the result of an iterative process of n steps where at the ith steps α i d i is added. This can be generalized further such a way that the starting point of the iteration can be arbitery x 0 We introduce the conjugate direction theorem here. Theorem[Conjugate Direction Theorem] Let d 0, d,..., d n vectors be Q-conjugate. x 0 R n be an arbitery starting point. x k+ = x k + α k d k
3 Lecture 0: September g k = Qx k b Then after n steps, x n = x α k = gt k d k = (Qx k b) T d k Proof Since {d 0, d,..., d n } vectors are independent. x x 0 = α 0 d α n d n for some α 0,..., α n Using the x k+ = x k + α k d k update rules, we have x = x 0 + α 0 d 0 x 2 = x 0 + α 0 d 0 + α x k = x 0 + α 0 d 0 + α +...α k d k x n = x 0 + α 0 d 0 + α d α n d n = x span Therefore, it s enough to prove that with these α k values we have α k = gt k d k We already know x x 0 = α 0 d α n d n x k x 0 = α 0 d α k d k span Therefore, d T k Q(x x 0 ) = d T k Q(α 0 d α n d n = α k α k = dt k Q(x x 0 ) d T k Q(x k x 0 ) = d T k Q(α 0 d 0 + α d α k d k ) = 0 d k Q(x x 0 ) = d T k Q(x x k + x k x 0 ) = d T k Q(x x k ) α k = dt k Q(x x 0 ) = dt k Q(x x k ) = dt k g k
4 0-4 Lecture 0: September 26 Another motivation for Q-conjugacy Our goal is: arg min x R n 2 xt Qx b T x x x 0 = α 0 d α n d n for some {α i } n i=0 R Therefore, f(x) = n 2 [x 0+ j=0 n α j d j ] T Q[x 0 + j=0 n α j d j ] b T [x 0 + j=0 n α j d j ]f(x) = c+ j=0 2 [x 0+α j d j ] T Q[x 0 +α j d j ] b T [x 0 +α j d j ] The optimization problem is transformed into a n separate -dimensional optimization problems Expanding Subspace Theorem Let B k = span(d 0,..., d k ) R n We will show as the method of conjugate directions progresses each x k minimizes the objective f(x) = 2 xt Qx b T x both over x 0 + B k and x k + αd k, α R.That is x k = arg min x=x k +αd k,α R 2 xt Qx b T xx k = arg min x=x 0+B k 2 xt Qx b T x Theorem[Expanding Subspace Theorem] Let {d i } n i=0 be a sequence of Q-conjugate vectors in Rn, x 0 R n arbitary x k+ = x k + α k d k x k = arg α k = gt k d k min x=x k +αd k,α R 2 xt Qx b T x x k = arg min x=x 0+B k 2 xt Qx b T x Proof It s enough to show that x k minimizes f on x = x 0 + B k since it contains the line:x = x k + αd k (by the definition of B k ) Since f is strictly convex, it s enough to show that g k = f (x k ) is orthogonal to B k We prove g k B k by induction k=0: B 0 is empty set. Assume that g k B k and prove that g k+ B k+ By definition, x k+ = x k + α k d k g k = Qx k b Therefore,
5 Lecture 0: September g k+ = Qx k+ b = Q(x k + α k d k ) b = (Qx k b) + α k Qd k span = g k + α k Qd k First, let us prove that g k+ d k We have proved g k+ = g k + α k Qd k By definition, α k = gt k d k Therefore, d T k g k+ = d T k (g k + α k Qd k ) = d T k d T k g k = 0 g k+ d k Now let us prove that g k+ d i, i < k Since g k+ = g k + α k Qd k g k B k = span(d 0,..., d k ) Therefore, d T i g k+ = d T i (g k + α k Qd k ) = d T i g k + α k d T i Qd k = 0 g k+ d i, i < k So far we have proved that g k+ B k+, where B k+ = span(d 0,..., d k ) Corollary of Exponential subspace theorem corollary g k B k = span(d 0,..., d k ) g T k d i = 0, 0 i k = B 0... B k B n = R n span since x k minimizes f over x 0 + B k. x n is the minimum of f in R n
6 0-6 Lecture 0: September Conjugate gradient method The conjugate gradient method isa conjugate direction method Selects the successive direction vectors as a conjugate version of the successive gradients obtained as the method progress The conjugate directions are not specified beforehandm but rather are determined sequentially at each step of the iteration 0.3. Description and advantage So far given d 0,..., d n, we already have an update rule for α k. α k = gt K d k d T K Qd k The problem is how should we choose vectors d 0,..., d n. The conjugate gradient method is the conjugate direction method that is obtained by selecting the successive direction vectors as a conjugate version of the successive gradients obtained as the method progresses. Thus, the directions are not specified beforehand, but rather are determined sequentially at each step of the iteration. There are three primary advantages to this method of direction selection.. Unless the solution is attained in less than n steps, the gradient is always nonzero and linearly independent of all previous direction vectors. 2. A more important advantage of the conjugate gradient method is the especially simple formula that is used to determine the new direction vector. This simplicity makes the method only slightly more complicated than steepest descent. 3. Because the directions are based on the gradients, the process makes good uniform progress toward the solution at every step. This is in contrast to the situation for arbitrary sequences of conjugate directions in which progress may be slight until the final few steps Algorithm Let x 0 R n be arbitrary. d 0 = g 0 = b Qx 0 α k = gt d k x k+ = x k + α k d k g k = Qx k b d k+ = g k+ + β k d k β k = gt k+ Qd k
7 Lecture 0: September In the algorithm the first step is identical to a steepest descent step; each succeeding step moves in a direction that is a linear combination of the current gradient and the preceding direction vector. The attractive feature of the algorithm is the simple formula for updating in the direction vector. The method is only slightly more complicated to implement than the method of steepest descent but converges in a finite number of steps. Theorem 0. The conjugate gradient algorithm is a conjugate direction method. If it does not terminate at x k, then a) span(g 0, g,..., g k ) = span(g 0, Qg 0,..., Q k g 0 ) b) span(d 0, d,..., d k ) = span(g 0, Qg 0,..., Q k g 0 ) c) d T k Qd i = 0, i < k d) α k = gt k g k e) β k = gt k+ g k+ g T k g k Proof: We first prove a), b) and c) simultaneously by induction. Clearly, they are true for k = 0. Now suppose they are true for k, we show that they are true for k +. We have g k+ = g k + α k Qd k By the induction hypothesis both g k and Qd k belong to span(g 0, Qg 0,..., Q k+ g 0 ), the first by a) and the second by b). Thus g k+ span(g 0, Qg 0,..., Q k+ g 0 ). Furthermore g k+ / span(d 0, d,..., d k ) = span(g 0, Qg 0,..., Q k g 0 ) since otherwise g k+ = 0, because for any conjugate direction method g k+ is orthogonal to span(d 0, d,..., d k )]. (The induction hypothesis on c) guarantees that the method is a conjugate direction method up to x k+.) Thus, finally we conclude that which proves a). To prove b) we write span(g 0, g,..., g k ) = span(g 0, Qg 0,..., Q k g 0 ) d k+ = g k+ + β k d k, and b) immediately follows form a) and the induction hypothesis on b). Next, to prove c we have d T k+qd i = g T k+qd i + β k d T k Qd i. For i = k the right side is zero by definition of β k. For i < k both terms vanish. The first term vanishes since Qd i span(d 0, d,..., d k+ ), the induction hypothesis which guarantees the method is a conjugate direction method up to x k+, and by the Expanding Subspace Theorem that guarantees that g k+ is orthogonal to span(d 0, d,..., d k+ ). The second term vanishes by the induction hypothesis on c). This proves c), which also proves that the method is a conjugate direction method. To prove d) we have g T k d k = g T k g k β k g T k d k, and the second term is zero by the Expanding Subspace Theorem.
8 0-8 Lecture 0: September 26 Finally, to prove e) we note that g T k+ g k = 0, because g k span(d 0, d,..., d k ) and g k+ span(d 0, d,..., d k ). Thus since Qd k = α k (g k+ g k ). We have g T k+qd k = α k g T k+g k Extension to non-quadratic problem 0.4. Description Our goal is to min f(x) x R n We will make quadratic approximation g k = f(x k ) and Q = 2 f(x k ). using these associations, reevaluated at each step, all quantities necessary to implement the basic conjugate gradient algorithm can be evaluated. If f is quadratic, these associations are identities, so that the general algorithm obtained by using them is a generalization of the conjugate gradient scheme. This is similar to the philosophy underlying Newtons method where at each step the solution of a general problem is approximated by the solution of a purely quadratic problem through these same associations. When applied to nonquadratic problems, conjugate gradient methods will not usually terminate within n steps. It is possible therefore simply to continue finding new directions according to the algorithm and terminate only when some termination criterion is met. Therefore after n steps, we can restart the process from this point and run the algorithm for another n steps Algorithm. Starting at x 0 and compute g 0 = f(x 0 ) and set d 0 = g For k = 0,,..., n : a) Set x k+ = α k d k where α k = b) Compute g k+ = f(x k+ ) g T k d k d T k [ 2 f(x k )]d k c) Unless k = n, set d k+ = g k+ + β k d k where β k = gt k+ [ f(x k)]d k d T k [ 2 f(x k )]d k 3. Replace x 0 by x n and go back to step. Until converge Properties of CGA An attractive feature of the algorithm is that, just as in the pure form of Newtons method, no line searching is required at any stage. Also, the algorithm converges in a finite number of steps for a quadratic problem. The undesirable features are that F (x k ) must be evaluated at each point, which is often impractical, and that the algorithm is not, in this form, globally convergent.
9 Lecture 0: September Line search method Two of the line search method are FletcherReeves method and Ploak-Ribiere method. This algorithm is quite similar to the Conjugate gradient method. The slight differences are in step 2 a) and 2 c) above Algorithm for Fletcher Reeves method. Starting at x 0 and compute g 0 = f(x 0 ) and set d 0 = g For k = 0,,..., n : a) Set x k+ = α k d k where α k = arg min α f(x k + αd k ) b) Compute g k+ = f(x k+ ) c) Unless k = n, set d k+ = g k+ + β k d k where β k = gt k+ g k+ g T k g k 3. Replace x 0 by x n and go back to step. Until converge. Firstly Fletcher-Reeves method is line search method when evaluating α and there is no close form solution normally. Secondly the Hessian is not used in the algorithm. Also in the quadratic case it is identical to the original conjugate direction algorithm Algorithm for Polak Ribiere method Polak-Ribiere method is quite similar to Fletcher-Reeves method. The only difference occurs when evaluating β k.. Starting at x 0 and compute g 0 = f(x 0 ) and set d 0 = g For k = 0,,..., n : a) Set x k+ = α k d k where α k = arg min α f(x k + αd k ) b) Compute g k+ = f(x k+ ) c) Unless k = n, set d k+ = g k+ + β k d k where β k = (g k+ g k ) T g k+ g T k g k 3. Replace x 0 by x n and go back to step. Until converge. Again this leads to a value identical to the standard formula in the quadratic case. Experimental evidence seems to favor the PolakRibiere method over other methods of this general type.
10 0-0 Lecture 0: September Convergence Global convergence of the line search methods is established by noting that a pure steepest descent step is taken every n steps and serves as a spacer step. Since the other steps do not increase the objective, and in fact hopefully they decrease it, global convergence is assured. Thus the restarting aspect of the algorithm is important for global convergence analysis, since in general one cannot guarantee that the directions d k generated by the method are descent directions. The local convergence properties of both of the above, and most other, nonquadratic extensions of the conjugate gradient method can be inferred from the quadratic analysis. Since one complete cycle solves a quadratic problem exactly just as Newtons method does in one step, we expect that for general nonquadratic problems there will hold x k+n x c x k x 2 for some c and k = 0, n, 2n, 3n,.... This can indeed be proved, and of course underlies the original motivation for the method. For problems with large n, however, a result of this type is in itself of little comfort, since we probably hope to terminate in fewer than n steps. 0.7 summary Conjugate Direction Methods. - conjugate directions Minimizing quadratic functions Conjugate Gradient Methods for nonquadratic functions - Line search methods. FletcherReeves method 2. PolakRibiere method References [CW87] Luenberger, David G and Ye, Yinyu, Linear and nonlinear programming, Springer, 2008, vol. 6.
The Conjugate Gradient Algorithm
Optimization over a Subspace Conjugate Direction Methods Conjugate Gradient Algorithm Non-Quadratic Conjugate Gradient Algorithm Optimization over a Subspace Consider the problem min f (x) subject to x
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationChapter 10 Conjugate Direction Methods
Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method
More informationFALL 2018 MATH 4211/6211 Optimization Homework 4
FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the
More informationLecture 16: October 22
0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationLecture 6: September 12
10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationSteepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720
Steepest Descent Juan C. Meza Lawrence Berkeley National Laboratory Berkeley, California 94720 Abstract The steepest descent method has a rich history and is one of the simplest and best known methods
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationLecture 15: October 15
10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationLecture 6: September 17
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 11: October 2
10-725: Optimization Fall 2012 Lecture 11: October 2 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Tongbo Huang, Shoou-I Yu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationAn Iterative Descent Method
Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)
More informationLecture 17: Primal-dual interior-point methods part II
10-725/36-725: Convex Optimization Spring 2015 Lecture 17: Primal-dual interior-point methods part II Lecturer: Javier Pena Scribes: Pinchao Zhang, Wei Ma Note: LaTeX template courtesy of UC Berkeley EECS
More informationMA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS
MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationLecture 24: August 28
10-725: Optimization Fall 2012 Lecture 24: August 28 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Jiaji Zhou,Tinghui Zhou,Kawa Cheung Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationLecture 18: November Review on Primal-dual interior-poit methods
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationx k+1 = x k + α k p k (13.1)
13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.
ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, 2015 1 Name: Solution Score: /100 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully.
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationLecture 14: Newton s Method
10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationLecture 14: October 11
10-725: Optimization Fall 2012 Lecture 14: October 11 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Zitao Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationConjugate gradient algorithm for training neural networks
. Introduction Recall that in the steepest-descent neural network training algorithm, consecutive line-search directions are orthogonal, such that, where, gwt [ ( + ) ] denotes E[ w( t + ) ], the gradient
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationSteepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples
Conjugate gradient training algorithm Steepest descent algorithm So far: Heuristic improvements to gradient descent (momentum Steepest descent training algorithm Can we do better? Definitions: weight vector
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview
More informationConstrained optimization: direct methods (cont.)
Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a
More informationLecture 5: September Time Complexity Analysis of Local Alignment
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationJanuary 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13
Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationGradient Descent Methods
Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationSecond Order Optimization Algorithms I
Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationGradient-Based Optimization
Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More informationNumerical Optimization
Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent
More informationNotes on Some Methods for Solving Linear Systems
Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationConjugate Gradient Method
Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationDetermination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study
International Journal of Mathematics And Its Applications Vol.2 No.4 (2014), pp.47-56. ISSN: 2347-1557(online) Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms:
More informationSolutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.
Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes
More informationECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.
ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets
ECE580 Exam 2 November 01, 2011 1 Name: Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc. I do
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More informationConjugate Directions for Stochastic Gradient Descent
Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationIE 5531: Engineering Optimization I
IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24
More informationLecture 6: September 19
36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationLec10p1, ORF363/COS323
Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output
More information6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection
6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods
More informationISyE 6663 (Winter 2017) Nonlinear Optimization
Winter 017 TABLE OF CONTENTS ISyE 6663 (Winter 017) Nonlinear Optimization Prof. R. D. C. Monteiro Georgia Institute of Technology L A TEXer: W. KONG http://wwkong.github.io Last Revision: May 3, 017 Table
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationHomework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent
Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Sep 29 DUE: Oct 13, 5:00
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD
More informationLecture 23: Conditional Gradient Method
10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationUniversity of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number
Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions
More informationand P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s
Global Convergence of the Method of Shortest Residuals Yu-hong Dai and Ya-xiang Yuan State Key Laboratory of Scientic and Engineering Computing, Institute of Computational Mathematics and Scientic/Engineering
More information