Chapter 10 Conjugate Direction Methods

Size: px
Start display at page:

Download "Chapter 10 Conjugate Direction Methods"

Transcription

1 Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, Wei-Ta Chu 2012/4/13

2 Introduction Conjugate direction methods can be viewed as being intermediate between the method of steepest descent and Newton s method. Solve quadratics of variables in steps. The usual implementation, the conjugate gradient algorithm, requires no Hessian matrix evaluations. No matrix inversion and no storage of an matrix are required. The conjugate direction methods typically perform better than the method of steepest descent, but not as well as Newton s method. 2

3 Introduction For a quadratic function of variables,,, the best direction of search is in the -conjugate direction. Basically, two directions and in are said to be - conjugate if. Definition 10.1: Let be a real symmetric matrix. The directions are -conjugate if for all, we have 3

4 Introduction Lemma 10.1: Let be a symmetric positive definite matrix. If the directions,, are nonzero and -conjugate, then they are linearly independent. Proof: Let be scalars such that Premultiplying this equality by,, yields because all other terms,, by -conjugacy. But and ; hence,,. Therefore,,, are linearly independent. 4

5 Example Let Note that. The matrix is positive definite because all its leading principal minors are positive: Our goal is to construct a set of -conjugate vectors Let,,. We require that. We have 5

6 Example Let,,. Then,, and thus To find the third vector, which would be -conjugate with and, we require that and. We have If we take, then the resulting set of vectors is mutually conjugate. 6

7 The Conjugate Direction Algorithm This method of finding -conjugate vectors is inefficient. A systematic procedure for finding -conjugate vectors can be devised using the idea underlying the Gram-Schmidt process of transforming a given basis of into an orthonormal basis of. 7

8 The Conjugate Direction Algorithm Minimizing the quadratic function of variables where,. Note that because, the function has a global minimizer that can be found by solving Basic Conjugate Direction Algorithm. Given a starting point and -conjugate directions ; for, 8

9 The Conjugate Direction Algorithm Theorem 10.1: For any starting point, the basic conjugate direction algorithm converges to the unique (that solves ) in steps; that is,. Proof: Consider. Because the are linearly independent, there exist constants, such that Now premultiply both sides of this equation by to obtain where the terms, by the - conjugate property. Hence, 9

10 The Conjugate Direction Algorithm Now, we can write Therefore, So writing and premultiplying the above by, we obtain because and. Thus, and, which completes the proof. 10

11 Example Find the minimizer of using the conjugate direction method with the initial point, and -conjugate direction and. We have and hence Thus, 11

12 Example To find, we compute and Therefore, Because is a quadratic function in two variables, 12

13 The Conjugate Direction Algorithm For a quadratic function of variables, the conjugate direction method reaches the solution after steps. Suppose that we start at and search in the direction to obtain we claim that 13

14 The Conjugate Direction Algorithm The equation implies that has the property that, where. To see this, apply the chain rule to get Evaluating the above at, we get Because is a quadratic function of, and the coefficient of the term in is, the above implies that 14

15 The Conjugate Direction Algorithm Using a similar argument, we can show that for all, and hence, Lemma 10.2: In the conjugate direction algorithm, for all,, and Proof: Note that because. Thus, 15

16 The Conjugate Direction Algorithm Prove by induction. The result is true for because We now show that if the result is true for (i.e. ), then it is true for, i.e. Fix and. By the induction hypothesis, Because and by the -conjugacy, we have It remains to be shown that 16

17 The Conjugate Direction Algorithm Indeed, because. Therefore, by induction, for all and 17

18 The Conjugate Direction Algorithm By Lemma 10.2 we see that is orthogonal to any vector from the subspace spanned by We now show that not only does satisfy, but also In other words, if we write then we can express. As increases, the subspace expands, and will eventually fill the whole of R n (provided that are linearly independent). 18

19 The Conjugate Direction Algorithm Therefore, for some sufficiently large, will lie in. For this reason, the above result is sometimes called the expanding subspace theorem. To prove the expanding subspace theorem, define the matrix Note that. Also Hence, 19

20 The Conjugate Direction Algorithm Now, consider any vector. There exists a vector such that. Let. Note that is a quadratic function and has a unique minimizer that satisfies the FONC. By the chain rule, Therefore, By Lemma 10.2,. Therefore, satisfies the FONC for the quadratic function, and hence is the minimizer of ; that is, 20

21 The Conjugate Gradient Algorithm The conjugate direction algorithm is very effective. However, to use the algorithm, we need to specify the - conjugate directions. Fortunately, there is a way to generate -conjugate directions as we perform iterations. The conjugate gradient algorithm does not use prespecified conjugate directions, but instead computes the directions as the algorithm proceeds. At each stage of the algorithm, the direction is calculated as a linear combination of the previous direction and the current gradient, in such as way that all the directions are mutually - conjugate. 21

22 The Conjugate Gradient Algorithm For a quadratic function of variables, we can locate the function minimizer by performing searches along mutually conjugate directions. We consider the quadratic function where. Our first search direction from an initial point is in the direction of steepest descent; that is, Thus,, where 22

23 The Conjugate Gradient Algorithm We search in a direction that is -conjugate to. We choose as a linear combination of and. In general, at the th step, we choose as a linear combination of and. Specifically, we choose The coefficients, are chosen in such a way that is -conjugate to. This is accomplished by choosing to be 23

24 The Conjugate Gradient Algorithm The algorithm 1. Set ; select the initial point 2.. If, stop; else, set If, stop Set ; go to step 3. 24

25 Example Proposition 10.1: In the conjugate gradient algorithm, the directions are -conjugate. Consider the quadratic function We find the minimizer using the conjugate gradient algorithm, using the starting point We can represent as where 25

26 Example We have Hence, 26

27 Example Hence, 27

28 Proof of Proposition 10.1 We use induction. We first show that. Substituting for we see that Assume that, are -conjugate directions. From Lemma 10.2 we have Thus, is orthogonal to each of the directions We now show that 28

29 Proof of Proposition 10.1 Fix. We have Substituting this equation into the previous one yields Because, it follows that We are now ready to show that We have If, then, by virtue of the induction hypothesis. Hence, we have 29

30 Proof of Proposition 10.1 But. Because Thus, It remains to show. We have Using the expression for, we get, which completes the proof. 30

31 The Conjugate Gradient Algorithm for Nonquadratic Problems The algorithm can be extended to general nonlinear functions by interpreting as a second-order Taylor series approximation of the objective function. For a quadratic, the matrix, the Hessian of the quadratic, is constant. However, for a general nonlinear function the Hessian is a matrix that has to be reevaluated at each iteration of the algorithm. Observe that appears only in the computation of the scalars and. Because can be replaced by a numerical line search procedure, we need only concern ourselves with the formula for. 31

32 The Conjugate Gradient Algorithm for Nonquadratic Problems Hestenes-Stiefel Formula. Replacing the by the term. The two terms are equal in the quadratic case.. Premultiplying both sides by, subtracting from both sides, and recognizing that we get, which we can rewrite as. Therefore, the Hestenes-Stiefel Formula 32

33 The Conjugate Gradient Algorithm for Nonquadratic Problems Polak-Ribiere Formula. Starting from the Hestenes-Stiefel formula, we multiply out the denominator to get By Lemma 10.2,. Also, since and premultiplying this by, we get where once again we used Lemma Hence, we get the Polak-Ribiere formula 33

34 The Conjugate Gradient Algorithm for Nonquadratic Problems Flether-Reeves Formula. Starting with the Polak-Ribiere formula, we multiply out the numerator to get We now use, which we get by using the equation and applying Lemma This leads to the Fletcher-Reeves formula 34

35 The Conjugate Gradient Algorithm for Nonquadratic Problems Without the Hessian matrix, all we need are the objective function and gradient values at each iteration. For the quadratic case the three expressions for are exactly equal. A very important issue in minimization problems of nonquadratic functions is the line search. If the line search is known to be inaccurate, the Hestenes-Stiefel formula for is recommended. 35

36 Homework 2 Exercises 8.3, 8.15 Exercises 9.1 Exercises 10.9 Hand over your homework at the class of Apr

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples Conjugate gradient training algorithm Steepest descent algorithm So far: Heuristic improvements to gradient descent (momentum Steepest descent training algorithm Can we do better? Definitions: weight vector

More information

The Conjugate Gradient Algorithm

The Conjugate Gradient Algorithm Optimization over a Subspace Conjugate Direction Methods Conjugate Gradient Algorithm Non-Quadratic Conjugate Gradient Algorithm Optimization over a Subspace Consider the problem min f (x) subject to x

More information

Conjugate gradient algorithm for training neural networks

Conjugate gradient algorithm for training neural networks . Introduction Recall that in the steepest-descent neural network training algorithm, consecutive line-search directions are orthogonal, such that, where, gwt [ ( + ) ] denotes E[ w( t + ) ], the gradient

More information

FALL 2018 MATH 4211/6211 Optimization Homework 4

FALL 2018 MATH 4211/6211 Optimization Homework 4 FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes

More information

Lecture 10: September 26

Lecture 10: September 26 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Lec10p1, ORF363/COS323

Lec10p1, ORF363/COS323 Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

Conjugate Directions for Stochastic Gradient Descent

Conjugate Directions for Stochastic Gradient Descent Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms

More information

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

There are two things that are particularly nice about the first basis

There are two things that are particularly nice about the first basis Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note 25

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note 25 EECS 6 Designing Information Devices and Systems I Spring 8 Lecture Notes Note 5 5. Speeding up OMP In the last lecture note, we introduced orthogonal matching pursuit OMP, an algorithm that can extract

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax

More information

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 205 Motivation When working with an inner product space, the most

More information

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6 Chapter 6 Orthogonality 6.1 Orthogonal Vectors and Subspaces Recall that if nonzero vectors x, y R n are linearly independent then the subspace of all vectors αx + βy, α, β R (the space spanned by x and

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

The conjugate gradient method

The conjugate gradient method The conjugate gradient method Michael S. Floater November 1, 2011 These notes try to provide motivation and an explanation of the CG method. 1 The method of conjugate directions We want to solve the linear

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for

More information

SOLUTIONS TO EXERCISES FOR MATHEMATICS 133 Part 1. I. Topics from linear algebra

SOLUTIONS TO EXERCISES FOR MATHEMATICS 133 Part 1. I. Topics from linear algebra SOLUTIONS TO EXERCISES FOR MATHEMATICS 133 Part 1 Winter 2009 I. Topics from linear algebra I.0 : Background 1. Suppose that {x, y} is linearly dependent. Then there are scalars a, b which are not both

More information

the method of steepest descent

the method of steepest descent MATH 3511 Spring 2018 the method of steepest descent http://www.phys.uconn.edu/ rozman/courses/m3511_18s/ Last modified: February 6, 2018 Abstract The Steepest Descent is an iterative method for solving

More information

Conjugate Gradients I: Setup

Conjugate Gradients I: Setup Conjugate Gradients I: Setup CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Conjugate Gradients I: Setup 1 / 22 Time for Gaussian Elimination

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Chapter 1.6. Perform Operations with Complex Numbers

Chapter 1.6. Perform Operations with Complex Numbers Chapter 1.6 Perform Operations with Complex Numbers EXAMPLE Warm-Up 1 Exercises Solve a quadratic equation Solve 2x 2 + 11 = 37. 2x 2 + 11 = 37 2x 2 = 48 Write original equation. Subtract 11 from each

More information

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Math 24 Spring 2012 Questions (mostly) from the Textbook

Math 24 Spring 2012 Questions (mostly) from the Textbook Math 24 Spring 2012 Questions (mostly) from the Textbook 1. TRUE OR FALSE? (a) The zero vector space has no basis. (F) (b) Every vector space that is generated by a finite set has a basis. (c) Every vector

More information

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process Worksheet for Lecture Name: Section.4 Gram-Schmidt Process Goal For a subspace W = Span{v,..., v n }, we want to find an orthonormal basis of W. Example Let W = Span{x, x } with x = and x =. Give an orthogonal

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

MTH 2310, FALL Introduction

MTH 2310, FALL Introduction MTH 2310, FALL 2011 SECTION 6.2: ORTHOGONAL SETS Homework Problems: 1, 5, 9, 13, 17, 21, 23 1, 27, 29, 35 1. Introduction We have discussed previously the benefits of having a set of vectors that is linearly

More information

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019 Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent

More information

The Method of Conjugate Directions 21

The Method of Conjugate Directions 21 The Method of Conjugate Directions 1 + 1 i : 7. The Method of Conjugate Directions 7.1. Conjugacy Steepest Descent often finds itself taking steps in the same direction as earlier steps (see Figure 8).

More information

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later. 34 To obtain an eigenvector x 2 0 2 for l 2 = 0, define: B 2 A - l 2 I 2 = È 1, 1, 1 Î 1-0 È 1, 0, 0 Î 1 = È 1, 1, 1 Î 1. To transform B 2 into an upper triangular matrix, subtract the first row of B 2

More information

TMA 4180 Optimeringsteori THE CONJUGATE GRADIENT METHOD

TMA 4180 Optimeringsteori THE CONJUGATE GRADIENT METHOD INTRODUCTION TMA 48 Optimeringsteori THE CONJUGATE GRADIENT METHOD H. E. Krogstad, IMF, Spring 28 This note summarizes main points in the numerical analysis of the Conjugate Gradient (CG) method. Most

More information

Lecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods.

Lecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods. Lecture 35 Minimization and maximization of functions Powell s method in multidimensions Conjugate gradient method. Annealing methods. We know how to minimize functions in one dimension. If we start at

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Math 416, Spring 2010 Gram-Schmidt, the QR-factorization, Orthogonal Matrices March 4, 2010 GRAM-SCHMIDT, THE QR-FACTORIZATION, ORTHOGONAL MATRICES

Math 416, Spring 2010 Gram-Schmidt, the QR-factorization, Orthogonal Matrices March 4, 2010 GRAM-SCHMIDT, THE QR-FACTORIZATION, ORTHOGONAL MATRICES Math 46, Spring 00 Gram-Schmidt, the QR-factorization, Orthogonal Matrices March 4, 00 GRAM-SCHMIDT, THE QR-FACTORIZATION, ORTHOGONAL MATRICES Recap Yesterday we talked about several new, important concepts

More information

Homework 11 Solutions. Math 110, Fall 2013.

Homework 11 Solutions. Math 110, Fall 2013. Homework 11 Solutions Math 110, Fall 2013 1 a) Suppose that T were self-adjoint Then, the Spectral Theorem tells us that there would exist an orthonormal basis of P 2 (R), (p 1, p 2, p 3 ), consisting

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Math Linear Algebra

Math Linear Algebra Math 220 - Linear Algebra (Summer 208) Solutions to Homework #7 Exercise 6..20 (a) TRUE. u v v u = 0 is equivalent to u v = v u. The latter identity is true due to the commutative property of the inner

More information

Iterative Linear Solvers

Iterative Linear Solvers Chapter 10 Iterative Linear Solvers In the previous two chapters, we developed strategies for solving a new class of problems involving minimizing a function f ( x) with or without constraints on x. In

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

Lecture 10: October 27, 2016

Lecture 10: October 27, 2016 Mathematical Toolkit Autumn 206 Lecturer: Madhur Tulsiani Lecture 0: October 27, 206 The conjugate gradient method In the last lecture we saw the steepest descent or gradient descent method for finding

More information

Linear Independence. Stephen Boyd. EE103 Stanford University. October 9, 2017

Linear Independence. Stephen Boyd. EE103 Stanford University. October 9, 2017 Linear Independence Stephen Boyd EE103 Stanford University October 9, 2017 Outline Linear independence Basis Orthonormal vectors Gram-Schmidt algorithm Linear independence 2 Linear dependence set of n-vectors

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008 Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008 Exam 2 will be held on Tuesday, April 8, 7-8pm in 117 MacMillan What will be covered The exam will cover material from the lectures

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

New hybrid conjugate gradient methods with the generalized Wolfe line search

New hybrid conjugate gradient methods with the generalized Wolfe line search Xu and Kong SpringerPlus (016)5:881 DOI 10.1186/s40064-016-5-9 METHODOLOGY New hybrid conjugate gradient methods with the generalized Wolfe line search Open Access Xiao Xu * and Fan yu Kong *Correspondence:

More information

Lecture 28 Continuous-Time Fourier Transform 2

Lecture 28 Continuous-Time Fourier Transform 2 Lecture 28 Continuous-Time Fourier Transform 2 Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/6/14 1 Limit of the Fourier Series Rewrite (11.9) and (11.10) as As, the fundamental

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

4.4. Orthogonality. Note. This section is awesome! It is very geometric and shows that much of the geometry of R n holds in Hilbert spaces.

4.4. Orthogonality. Note. This section is awesome! It is very geometric and shows that much of the geometry of R n holds in Hilbert spaces. 4.4. Orthogonality 1 4.4. Orthogonality Note. This section is awesome! It is very geometric and shows that much of the geometry of R n holds in Hilbert spaces. Definition. Elements x and y of a Hilbert

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

x k+1 = x k + α k p k (13.1)

x k+1 = x k + α k p k (13.1) 13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,

More information

The Gram Schmidt Process

The Gram Schmidt Process u 2 u The Gram Schmidt Process Now we will present a procedure, based on orthogonal projection, that converts any linearly independent set of vectors into an orthogonal set. Let us begin with the simple

More information

The Gram Schmidt Process

The Gram Schmidt Process The Gram Schmidt Process Now we will present a procedure, based on orthogonal projection, that converts any linearly independent set of vectors into an orthogonal set. Let us begin with the simple case

More information

Lecture 8 Optimization

Lecture 8 Optimization 4/9/015 Lecture 8 Optimization EE 4386/5301 Computational Methods in EE Spring 015 Optimization 1 Outline Introduction 1D Optimization Parabolic interpolation Golden section search Newton s method Multidimensional

More information

NOTES (1) FOR MATH 375, FALL 2012

NOTES (1) FOR MATH 375, FALL 2012 NOTES 1) FOR MATH 375, FALL 2012 1 Vector Spaces 11 Axioms Linear algebra grows out of the problem of solving simultaneous systems of linear equations such as 3x + 2y = 5, 111) x 3y = 9, or 2x + 3y z =

More information

Combining Conjugate Direction Methods with Stochastic Approximation of Gradients

Combining Conjugate Direction Methods with Stochastic Approximation of Gradients Combining Conjugate Direction Methods with Stochastic Approximation of Gradients Nicol N Schraudolph Thore Graepel Institute of Computational Sciences Eidgenössische Technische Hochschule (ETH) CH-8092

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Mathematical Methods wk 1: Vectors

Mathematical Methods wk 1: Vectors Mathematical Methods wk : Vectors John Magorrian, magog@thphysoxacuk These are work-in-progress notes for the second-year course on mathematical methods The most up-to-date version is available from http://www-thphysphysicsoxacuk/people/johnmagorrian/mm

More information

Mathematical Methods wk 1: Vectors

Mathematical Methods wk 1: Vectors Mathematical Methods wk : Vectors John Magorrian, magog@thphysoxacuk These are work-in-progress notes for the second-year course on mathematical methods The most up-to-date version is available from http://www-thphysphysicsoxacuk/people/johnmagorrian/mm

More information

Homework 5. (due Wednesday 8 th Nov midnight)

Homework 5. (due Wednesday 8 th Nov midnight) Homework (due Wednesday 8 th Nov midnight) Use this definition for Column Space of a Matrix Column Space of a matrix A is the set ColA of all linear combinations of the columns of A. In other words, if

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Upon successful completion of MATH 220, the student will be able to:

Upon successful completion of MATH 220, the student will be able to: MATH 220 Matrices Upon successful completion of MATH 220, the student will be able to: 1. Identify a system of linear equations (or linear system) and describe its solution set 2. Write down the coefficient

More information

M 2 F g(i,j) = )))))) Mx(i)Mx(j) If G(x) is constant, F(x) is a quadratic function and can be expressed as. F(x) = (1/2)x T Gx + c T x + " (1)

M 2 F g(i,j) = )))))) Mx(i)Mx(j) If G(x) is constant, F(x) is a quadratic function and can be expressed as. F(x) = (1/2)x T Gx + c T x +  (1) Gradient Techniques for Unconstrained Optimization Gradient techniques can be used to minimize a function F(x) with respect to the n by 1 vector x, when the gradient is available or easily estimated. These

More information

Math 1180, Notes, 14 1 C. v 1 v n v 2. C A ; w n. A and w = v i w i : v w = i=1

Math 1180, Notes, 14 1 C. v 1 v n v 2. C A ; w n. A and w = v i w i : v w = i=1 Math 8, 9 Notes, 4 Orthogonality We now start using the dot product a lot. v v = v v n then by Recall that if w w ; w n and w = v w = nx v i w i : Using this denition, we dene the \norm", or length, of

More information

We showed that adding a vector to a basis produces a linearly dependent set of vectors; more is true.

We showed that adding a vector to a basis produces a linearly dependent set of vectors; more is true. Dimension We showed that adding a vector to a basis produces a linearly dependent set of vectors; more is true. Lemma If a vector space V has a basis B containing n vectors, then any set containing more

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Linear Algebra 2 Spectral Notes

Linear Algebra 2 Spectral Notes Linear Algebra 2 Spectral Notes In what follows, V is an inner product vector space over F, where F = R or C. We will use results seen so far; in particular that every linear operator T L(V ) has a complex

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections Section 6. 6. Orthogonal Sets Orthogonal Projections Main Ideas in these sections: Orthogonal set = A set of mutually orthogonal vectors. OG LI. Orthogonal Projection of y onto u or onto an OG set {u u

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

which are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar.

which are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar. It follows that S is linearly dependent since the equation is satisfied by which are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar. is

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

Mathematical optimization

Mathematical optimization Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the

More information

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Lecture 1: Basic Concepts

Lecture 1: Basic Concepts ENGG 5781: Matrix Analysis and Computations Lecture 1: Basic Concepts 2018-19 First Term Instructor: Wing-Kin Ma This note is not a supplementary material for the main slides. I will write notes such as

More information

Vectors. Vectors and the scalar multiplication and vector addition operations:

Vectors. Vectors and the scalar multiplication and vector addition operations: Vectors Vectors and the scalar multiplication and vector addition operations: x 1 x 1 y 1 2x 1 + 3y 1 x x n 1 = 2 x R n, 2 2 y + 3 2 2x = 2 + 3y 2............ x n x n y n 2x n + 3y n I ll use the two terms

More information

Functional Analysis HW #5

Functional Analysis HW #5 Functional Analysis HW #5 Sangchul Lee October 29, 2015 Contents 1 Solutions........................................ 1 1 Solutions Exercise 3.4. Show that C([0, 1]) is not a Hilbert space, that is, there

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information