The Conjugate Gradient Algorithm

Size: px
Start display at page:

Download "The Conjugate Gradient Algorithm"

Transcription

1

2 Optimization over a Subspace Conjugate Direction Methods Conjugate Gradient Algorithm Non-Quadratic Conjugate Gradient Algorithm

3 Optimization over a Subspace Consider the problem min f (x) subject to x x 0 + S, where f : R n R is continuously differentiable and S is the subspace S := Span{v 1,..., v k }.

4 Optimization over a Subspace Consider the problem min f (x) subject to x x 0 + S, where f : R n R is continuously differentiable and S is the subspace S := Span{v 1,..., v k }. If V R n k has columns v 1,..., v k, then this problem is equivalent min f (x 0 + Vz) subject to z R k.

5 Subspace Optimality Condition min f (x 0 + Vz) subject to z R k. Set ˆf (z) = f (x 0 + Vz). If z solves this problem, then V T f (x 0 + V z) = ˆf ( z) = 0.

6 Subspace Optimality Condition min f (x 0 + Vz) subject to z R k. Set ˆf (z) = f (x 0 + Vz). If z solves this problem, then V T f (x 0 + V z) = ˆf ( z) = 0. Set x = x 0 + V z, we note that x solves the original problem if and only if z solves the problem above.

7 Subspace Optimality Condition min f (x 0 + Vz) subject to z R k. Set ˆf (z) = f (x 0 + Vz). If z solves this problem, then V T f (x 0 + V z) = ˆf ( z) = 0. Set x = x 0 + V z, we note that x solves the original problem if and only if z solves the problem above. Then V T f ( x) = 0, or equivalently, vi T f ( x) = 0 for i = 1, 2,..., k, go f ( x) Span(v 1,..., v k ).

8 Subspace Optimality Theorem Consider the problem P S min f (x) subject to x x 0 + S, where f : R n R is continuously differentiable and S is the subspace S := Span{v 1,..., v k }. If x solves P S, then f ( x) S.

9 Subspace Optimality Theorem Consider the problem P S min f (x) subject to x x 0 + S, where f : R n R is continuously differentiable and S is the subspace S := Span{v 1,..., v k }. If x solves P S, then f ( x) S. If it is further assumed that f is conves, then x solves P S if and only if f ( x) S.

10 Conjugate Direction Algorithm P : where f : R n R is C 2 is given by minimize f (x) subject to x R n f (x) := 1 2 x T Qx b T x with Q is a symmetric positive definite.

11 Conjugate Direction Algorithm Definition [Conjugacy] Let Q R n n be symmetric and positive definite. We say that the vectors x, y R n \{0} are Q-conjugate (or Q-orthogonal) if x T Qy = 0.

12 Conjugate Direction Algorithm Definition [Conjugacy] Let Q R n n be symmetric and positive definite. We say that the vectors x, y R n \{0} are Q-conjugate (or Q-orthogonal) if x T Qy = 0. Proposition [Conjugacy implies Linear Independence] If Q R n n is positive definite and the set of nonzero vectors d 0, d 1,..., d k are (pairwise) Q-conjugate, then these vectors are linearly independent.

13 Conjugate Direction Algorithm [Conjugate Direction Algorithm] Let {d i } n 1 i=0 be a set of nonzero Q-conjugate vectors. For any x 0 R n the sequence {x k } generated according to with x k+1 := x k + α k d k, k 0 α k := arg min{f (x k + αd k ) : α R} converges to the unique solution, x of P after n steps, that is x n = x.

14 Conjugate Direction Algorithm [Conjugate Direction Algorithm] Let {d i } n 1 i=0 be a set of nonzero Q-conjugate vectors. For any x 0 R n the sequence {x k } generated according to with x k+1 := x k + α k d k, k 0 α k := arg min{f (x k + αd k ) : α R} converges to the unique solution, x of P after n steps, that is x n = x. We have already shown that α k = f (x k )i T d k /d k Qd k.

15 Expanding Subspace Theorem Let {d i } n 1 i=0 be a sequence of nonzero Q-conjugate vectors in Rn. Then for any x 0 R n the sequence {x k } generated according to x k+1 α k = x k + α k d k = g T k d k d T k Qd k has the property that f (x) = 1 2 x T Qx b T x attains its minimum value on the affine set x 0 + Span {d 0,..., d k } at the point x k.

16 Expanding Subspace Theorem: Proof Let x solve min {f (x) x x 0 + Span {d 0,..., d k }}. The Subspace Optimality Theorem tells us that f ( x) T d i = 0, i = 0,..., k.

17 Expanding Subspace Theorem: Proof Let x solve min {f (x) x x 0 + Span {d 0,..., d k }}. The Subspace Optimality Theorem tells us that f ( x) T d i = 0, i = 0,..., k. Since x x 0 + Span {d 0,..., d k }, there exist β i R such that x = x 0 + β 0 d 0 + β 2 d β k d k.

18 Expanding Subspace Theorem: Proof Let x solve min {f (x) x x 0 + Span {d 0,..., d k }}. The Subspace Optimality Theorem tells us that f ( x) T d i = 0, i = 0,..., k. Since x x 0 + Span {d 0,..., d k }, there exist β i R such that Therefore, 0 = f ( x) T d i x = x 0 + β 0 d 0 + β 2 d β k d k. = (Q(x 0 + β 0 d 0 + β 2 d β k d k 1 ) + g) T d i = (Qx 0 + g) T d i + β 0 d T 0 Qd i + β 2 d T 2 Qd i + + β k d T k Qd i = f (x 0 ) T d i + β i d T i Qd i.

19 Expanding Subspace Theorem: Proof Hence, 0 = f (x 0 ) T d i + β i d T i Qd i, i = 0, 1,..., k. β i = f (x 0 ) T d i /d T i Qd i, i = 0,..., k.

20 Expanding Subspace Theorem: Proof 0 = f (x 0 ) T d i + β i di T Qd i, i = 0, 1,..., k. Hence, β i = f (x 0 ) T d i /di T Qd i, i = 0,..., k. Similarly, the iteration gives x i+1 α i = x i + α i d i = f (x i ) T d i di T Qd i x k+1 = x 0 + α 0 d 0 + α 2 d α k d k.

21 Expanding Subspace Theorem: Proof 0 = f (x 0 ) T d i + β i di T Qd i, i = 0, 1,..., k. Hence, β i = f (x 0 ) T d i /di T Qd i, i = 0,..., k. Similarly, the iteration gives So we need to show x i+1 α i = x i + α i d i = f (x i ) T d i di T Qd i x k+1 = x 0 + α 0 d 0 + α 2 d α k d k. β i = f (x 0) T d i di T = f (xi)t d i Qd i di T = β i, i = 0,..., k. Qd i

22 Expanding Subspace Theorem: Proof f (x i ) T d i = (Q(x 0 + α 0 d α i 1 d i 1 ) + g) T d i = (Qx 0 + g) T d i + α 0 d0 T Qd i + + α i 1 di 1Qd T i = f (x 0 ) T d i

23 Initialization: x 0 R n, d 0 = g 0 = f (x 0 ) = b Qx 0. For k = 0, 1, 2,... α k x k+1 := gk T d k/dk T Qd k := x k + α k d k g k+1 := Qx k+1 b (STOP if g k+1 = 0) β k d k+1 := gk+1 T Qd k/dk T Qd k := g k+1 + β k d k k := k + 1.

24 The Conjugate Gradient Theorem The C-G algorithm is a conjugate direction method. If it does not terminate at x k (g k 0), then 1. Span [g 0, g 1,..., g k ] = span [g 0, Qg 0,..., Q k g 0 ] 2. Span [d 0, d 1,..., d k ] = span [g 0, Qg 0,..., Q k g 0 ] 3. d T k Qd i = 0 for i k 1 4. α k = g T k g k/d T k Qd k 5. β k = g T k+1 g k+1/g T k g k.

25 The Conjugate Gradient Theorem: Proof Prove (1)-(3) by induction. (1)-(3) true for k = 0. Suppose true up to k and show true for k + 1. (1) Span [g 0, g 1,..., g k ] = Span [g 0, Qg 0,..., Q k g 0 ]

26 The Conjugate Gradient Theorem: Proof Prove (1)-(3) by induction. (1)-(3) true for k = 0. Suppose true up to k and show true for k + 1. (1) Span [g 0, g 1,..., g k ] = Span [g 0, Qg 0,..., Q k g 0 ] Since g k+1 = g k + α k Qd k, g k+1 Span[g 0,..., Q k+1 g 0 ] (ind. hyp.).

27 The Conjugate Gradient Theorem: Proof Prove (1)-(3) by induction. (1)-(3) true for k = 0. Suppose true up to k and show true for k + 1. (1) Span [g 0, g 1,..., g k ] = Span [g 0, Qg 0,..., Q k g 0 ] Since g k+1 = g k + α k Qd k, g k+1 Span[g 0,..., Q k+1 g 0 ] (ind. hyp.). Also g k+1 / Span [d 0,..., d k ] otherwise g k+1 = 0 (by The Subspace Optimality Theorem) since the method is a conjugate direction method up to step k (ind. hyp.). So g k+1 / Span [g 0,..., Q k g 0 ] and Span [g 0, g 1,..., g k+1 ] = Span [g 0,..., Q k+1 g 0 ] proving (1).

28 The Conjugate Gradient Theorem: Proof (2) Span [d 0, d 1,..., d k ] = Span [g 0, Qg 0,..., Q k g 0 ]

29 The Conjugate Gradient Theorem: Proof (2) Span [d 0, d 1,..., d k ] = Span [g 0, Qg 0,..., Q k g 0 ] To prove (2) write d k+1 = g k+1 + β k d k so that (2) follows from (1) and the induction hypothesis on (2).

30 The Conjugate Gradient Theorem: Proof (3) d T k Qd i = 0 for i k 1

31 The Conjugate Gradient Theorem: Proof (3) dk T Qd i = 0 for i k 1 To see (3) observe that dk+1 T Qd i = g k+1 Qd i + β k dk T Qd i. For i = k the right hand side is zero by the definition of β k.

32 The Conjugate Gradient Theorem: Proof (3) d T k Qd i = 0 for i k 1 To see (3) observe that d T k+1 Qd i = g k+1 Qd i + β k d T k Qd i. For i = k the right hand side is zero by the definition of β k. For i < k both terms vanish. The term g T k+1 Qd i = 0 by the Expanding Subspace Theorem since Qd i Span[d 0,..., d k ] by (1) and (2). The term d T k Qd i vanishes by the induction hypothesis on (3).

33 The Conjugate Gradient Theorem: Proof (4) α k = g T k g k/d T k Qd k

34 The Conjugate Gradient Theorem: Proof (4) α k = g T k g k/d T k Qd k g T k d k = g T k g k β k 1 g T k d k 1 where g T k d k 1 = 0 by the Expanding Subspace Theorem. So α k = g T k d k/d T k Qd k = g T k g k/d T k Qd k.

35 The Conjugate Gradient Theorem: Proof (5) β k = g T k+1 g k+1/g T k g k

36 The Conjugate Gradient Theorem: Proof (5) β k = g T k+1 g k+1/g T k g k g T k+1 g k = 0 by the Expanding Subspace Theorem because g k Span[d 0,..., d k ].

37 The Conjugate Gradient Theorem: Proof (5) β k = g T k+1 g k+1/g T k g k g T k+1 g k = 0 by the Expanding Subspace Theorem because g k Span[d 0,..., d k ]. Hence gk+1 T Qd k = gk+1 T Q(x k+1 x k ) = 1 gk+1 T α k α [g k+1 g k ] = 1 gk+1 T k α g k+1. k

38 The Conjugate Gradient Theorem: Proof (5) β k = g T k+1 g k+1/g T k g k g T k+1 g k = 0 by the Expanding Subspace Theorem because g k Span[d 0,..., d k ]. Hence gk+1 T Qd k = gk+1 T Q(x k+1 x k ) = 1 gk+1 T α k α [g k+1 g k ] = 1 gk+1 T k α g k+1. k Therefore, β k = g T k+1 Qd k d T k Qd k = 1 α k g T k+1 g k+1 d T k Qd k = g k+1 T g k+1 gk T g. k

39 Comments on the CG Algorithm The C G method decribed above is a descent method since the values f (x 0 ), f (x 1 ),..., f (x n ) form a decreasing sequence. Moreover, note that f (x k ) T d k = g T k g k and α k > 0. Thus, the C G method behaves very much like the descent methods discussed peviously.

40 Comments on the CG Algorithm Due to the occurrence of round-off error the C-G algorithm is best implemented as an iterative method. That is, at the end of n steps, f may not attain its global minimum at x n and the intervening directions d k may not be Q-conjugate. But it is also possible for the CG algorithm to terminate early.

41 Comments on the CG Algorithm Due to the occurrence of round-off error the C-G algorithm is best implemented as an iterative method. That is, at the end of n steps, f may not attain its global minimum at x n and the intervening directions d k may not be Q-conjugate. But it is also possible for the CG algorithm to terminate early. Consequently, at each step one should check the value f (x k+1 ) and the size of the step x k+1 x k. If either is sufficiently small, then accept x k as the point at which f attains its global minimum value; otherwise, continue to iterate regardless of the iteration count (up to a maximum acceptable number of iterations). Since CG is a descent method, continued progress is assured.

42 The Non-Quadratic CG Algorithm Initialization: x 0 R n, g 0 = f (x 0 ), d 0 = g 0, 0 < c < β < 1. Having x k otain x k+1 as follows: Check restart criteria. If a restart condition is satisfied, then reset x 0 = x n, g 0 = f (x 0 ), d 0 = g 0 ; otherwise, set { α k λ λ > 0, f (x } k + λd k ) T d β f (x k ) T d k, and f (x k + λd k ) f (x k ) cλ f (x k ) T d k x k+1 := x k + α k d k g k+1 := f (x k+1 ) g T k+1 g k+1 gk β k := T g Fletcher-Reeves { k} max 0, g T k+1 (g k+1 g k ) gk T g Polak-Ribiere k d k+1 := g k+1 + β k d k k := k + 1.

43 The Non-Quadratic CG Algorithm Restart Conditions 1. k = n 2. gk+1 T g k 0.2gk T g k 3. 2gk T g k gk T d k 0.2gk T g k

44 The Non-Quadratic CG Algorithm The Polak-Ribiere update for β k has a demonstrated experimental superiority. One way to see why this might be true is to observe that g T k+1 (g k+1 g k ) α k g T k+1 2 f (x k )d k thereby yielding a better second order approximation. Indeed, the formula for β k in in the quadratic case is precisely α k g T k+1 2 f (x k )d k g T k g k.

Lecture 10: September 26

Lecture 10: September 26 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples Conjugate gradient training algorithm Steepest descent algorithm So far: Heuristic improvements to gradient descent (momentum Steepest descent training algorithm Can we do better? Definitions: weight vector

More information

FALL 2018 MATH 4211/6211 Optimization Homework 4

FALL 2018 MATH 4211/6211 Optimization Homework 4 FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Optimization of Quadratic Functions

Optimization of Quadratic Functions 36 CHAPER 4 Optimization of Quadratic Functions In this chapter we study the problem (4) minimize x R n x Hx + g x + β, where H R n n is symmetric, g R n, and β R. It has already been observed that we

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Conjugate gradient algorithm for training neural networks

Conjugate gradient algorithm for training neural networks . Introduction Recall that in the steepest-descent neural network training algorithm, consecutive line-search directions are orthogonal, such that, where, gwt [ ( + ) ] denotes E[ w( t + ) ], the gradient

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

x k+1 = x k + α k p k (13.1)

x k+1 = x k + α k p k (13.1) 13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes

More information

Conjugate Gradients I: Setup

Conjugate Gradients I: Setup Conjugate Gradients I: Setup CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Conjugate Gradients I: Setup 1 / 22 Time for Gaussian Elimination

More information

LINEAR CONVt;AUENCE OF THE CONJUGATE GRADIENT METHOD* - Harlan Crowder Philip Wolfe

LINEAR CONVt;AUENCE OF THE CONJUGATE GRADIENT METHOD* - Harlan Crowder Philip Wolfe LINEAR CONVt;AUENCE OF THE CONJUGATE GRADIENT METHOD* - by t". Harlan Crowder Philip Wolfe ibm Thomas J. Watson Research Center Yorktown Heights, New York ABSTRACT: I; is shown that the method of conjugate

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Gradient-based Methods Marc Toussaint U Stuttgart Gradient descent methods Plain gradient descent (with adaptive stepsize) Steepest descent (w.r.t. a known metric) Conjugate

More information

ISyE 6663 (Winter 2017) Nonlinear Optimization

ISyE 6663 (Winter 2017) Nonlinear Optimization Winter 017 TABLE OF CONTENTS ISyE 6663 (Winter 017) Nonlinear Optimization Prof. R. D. C. Monteiro Georgia Institute of Technology L A TEXer: W. KONG http://wwkong.github.io Last Revision: May 3, 017 Table

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Iterative Linear Solvers

Iterative Linear Solvers Chapter 10 Iterative Linear Solvers In the previous two chapters, we developed strategies for solving a new class of problems involving minimizing a function f ( x) with or without constraints on x. In

More information

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps. Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,

More information

Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

More information

The conjugate gradient method

The conjugate gradient method The conjugate gradient method Michael S. Floater November 1, 2011 These notes try to provide motivation and an explanation of the CG method. 1 The method of conjugate directions We want to solve the linear

More information

and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s

and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s Global Convergence of the Method of Shortest Residuals Yu-hong Dai and Ya-xiang Yuan State Key Laboratory of Scientic and Engineering Computing, Institute of Computational Mathematics and Scientic/Engineering

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms

More information

Math 24 Spring 2012 Questions (mostly) from the Textbook

Math 24 Spring 2012 Questions (mostly) from the Textbook Math 24 Spring 2012 Questions (mostly) from the Textbook 1. TRUE OR FALSE? (a) The zero vector space has no basis. (F) (b) Every vector space that is generated by a finite set has a basis. (c) Every vector

More information

An Iterative Descent Method

An Iterative Descent Method Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained Optimization

Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained Optimization BULLETIN of the Malaysian Mathematical Sciences Society http://math.usm.my/bulletin Bull. Malays. Math. Sci. Soc. (2) 34(2) (2011), 319 330 Open Problems in Nonlinear Conjugate Gradient Algorithms for

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

Lec10p1, ORF363/COS323

Lec10p1, ORF363/COS323 Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output

More information

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search ANZIAM J. 55 (E) pp.e79 E89, 2014 E79 On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search Lijun Li 1 Weijun Zhou 2 (Received 21 May 2013; revised

More information

On Lagrange multipliers of trust region subproblems

On Lagrange multipliers of trust region subproblems On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline

More information

An Efficient Modification of Nonlinear Conjugate Gradient Method

An Efficient Modification of Nonlinear Conjugate Gradient Method Malaysian Journal of Mathematical Sciences 10(S) March : 167-178 (2016) Special Issue: he 10th IM-G International Conference on Mathematics, Statistics and its Applications 2014 (ICMSA 2014) MALAYSIAN

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for

More information

Homework 11 Solutions. Math 110, Fall 2013.

Homework 11 Solutions. Math 110, Fall 2013. Homework 11 Solutions Math 110, Fall 2013 1 a) Suppose that T were self-adjoint Then, the Spectral Theorem tells us that there would exist an orthonormal basis of P 2 (R), (p 1, p 2, p 3 ), consisting

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

TMA 4180 Optimeringsteori THE CONJUGATE GRADIENT METHOD

TMA 4180 Optimeringsteori THE CONJUGATE GRADIENT METHOD INTRODUCTION TMA 48 Optimeringsteori THE CONJUGATE GRADIENT METHOD H. E. Krogstad, IMF, Spring 28 This note summarizes main points in the numerical analysis of the Conjugate Gradient (CG) method. Most

More information

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products Linear Algebra Paul Yiu Department of Mathematics Florida Atlantic University Fall 2011 6A: Inner products In this chapter, the field F = R or C. We regard F equipped with a conjugation χ : F F. If F =

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition 6 Vector Spaces with Inned Product Basis and Dimension Section Objective(s): Vector Spaces and Subspaces Linear (In)dependence Basis and Dimension Inner Product 6 Vector Spaces and Subspaces Definition

More information

Preconditioned conjugate gradient algorithms with column scaling

Preconditioned conjugate gradient algorithms with column scaling Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and

More information

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S. Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Conjugate Directions for Stochastic Gradient Descent

Conjugate Directions for Stochastic Gradient Descent Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions

More information

MTH 3318 Solutions to Induction Problems Fall 2009

MTH 3318 Solutions to Induction Problems Fall 2009 Pat Rossi Instructions. MTH 338 Solutions to Induction Problems Fall 009 Name Prove the following by mathematical induction. Set (): ++3+...+ n n(n+) i n(n+) Step #: Show that the proposition is true for

More information

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019 Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent

More information

Vectors in Function Spaces

Vectors in Function Spaces Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also

More information

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v

More information

Image restoration: numerical optimisation

Image restoration: numerical optimisation Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 10, 2011 T.M. Huang (NTNU) Conjugate Gradient Method October 10, 2011 1 / 36 Outline 1 Steepest

More information

Inner Product Spaces

Inner Product Spaces Inner Product Spaces Introduction Recall in the lecture on vector spaces that geometric vectors (i.e. vectors in two and three-dimensional Cartesian space have the properties of addition, subtraction,

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Modification of the Armijo line search to satisfy the convergence properties of HS method

Modification of the Armijo line search to satisfy the convergence properties of HS method Université de Sfax Faculté des Sciences de Sfax Département de Mathématiques BP. 1171 Rte. Soukra 3000 Sfax Tunisia INTERNATIONAL CONFERENCE ON ADVANCES IN APPLIED MATHEMATICS 2014 Modification of the

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods Optimization Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension

More information

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720 Steepest Descent Juan C. Meza Lawrence Berkeley National Laboratory Berkeley, California 94720 Abstract The steepest descent method has a rich history and is one of the simplest and best known methods

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Global Convergence Properties of the HS Conjugate Gradient Method

Global Convergence Properties of the HS Conjugate Gradient Method Applied Mathematical Sciences, Vol. 7, 2013, no. 142, 7077-7091 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.311638 Global Convergence Properties of the HS Conjugate Gradient Method

More information

Chapter 3. More about Vector Spaces Linear Independence, Basis and Dimension. Contents. 1 Linear Combinations, Span

Chapter 3. More about Vector Spaces Linear Independence, Basis and Dimension. Contents. 1 Linear Combinations, Span Chapter 3 More about Vector Spaces Linear Independence, Basis and Dimension Vincent Astier, School of Mathematical Sciences, University College Dublin 3. Contents Linear Combinations, Span Linear Independence,

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

M 2 F g(i,j) = )))))) Mx(i)Mx(j) If G(x) is constant, F(x) is a quadratic function and can be expressed as. F(x) = (1/2)x T Gx + c T x + " (1)

M 2 F g(i,j) = )))))) Mx(i)Mx(j) If G(x) is constant, F(x) is a quadratic function and can be expressed as. F(x) = (1/2)x T Gx + c T x +  (1) Gradient Techniques for Unconstrained Optimization Gradient techniques can be used to minimize a function F(x) with respect to the n by 1 vector x, when the gradient is available or easily estimated. These

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics

More information

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

On nonlinear optimization since M.J.D. Powell

On nonlinear optimization since M.J.D. Powell On nonlinear optimization since 1959 1 M.J.D. Powell Abstract: This view of the development of algorithms for nonlinear optimization is based on the research that has been of particular interest to the

More information

A First Course in Optimization: Answers to Selected Exercises

A First Course in Optimization: Answers to Selected Exercises A First Course in Optimization: Answers to Selected Exercises Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854 December 2, 2009 (The most recent

More information

Lecture 1: Basic Concepts

Lecture 1: Basic Concepts ENGG 5781: Matrix Analysis and Computations Lecture 1: Basic Concepts 2018-19 First Term Instructor: Wing-Kin Ma This note is not a supplementary material for the main slides. I will write notes such as

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Lecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods.

Lecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods. Lecture 35 Minimization and maximization of functions Powell s method in multidimensions Conjugate gradient method. Annealing methods. We know how to minimize functions in one dimension. If we start at

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information