Conjugate Gradient Tutorial

Size: px
Start display at page:

Download "Conjugate Gradient Tutorial"

Transcription

1 Conjugate Gradient Tutorial Prof. Chung-Kuan Cheng Computer Science and Engineering Department University of California, San Diego December 1, 2015 Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

2 Overview 1 Introduction Overview Formulation 2 Steepest Descent: Descent in One Vector Direction Steepest Descent Formula Steepest Descent Properties Steepest Descent Convergence Preconditioning 3 Conjugate Gradient: Descent with Multiple Vectors Multiple Vector Optimization Global Procedure in Matrix Form V k Conjugate Gradient: Wish List Conjugate Gradient Descent: Formula Validation of the Properties 4 Summary 5 References Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

3 Introduction: Overview Conjugate Gradient is an extension of steepest gradient descent. For steepest gradient, we step in one direction per iteration. Through the iterations, we found that the new directions may contain the component of the old directions and the process walks in zig-zag patterns. For conjugate gradient, we consider multiple directions simulteneously. Hence, we avoid to repeat the old directions. In 1952, Hestenes and Stiefel independently introduced conjugate gradient formula to simplify the multiple direction search. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

4 Introduction: Overview Steepest Gradient Descent: We derive the method and properties of the steepest descent method. We view the steepest descent method as an one-direction per iteration approach. The method suffers slow zig-zag winding in a narrow valley of equal potential terrain. Preconditioning: From the properties of the steepest descent method, we find that preconditioning improves the convergence rate. Conjugate Gradient in Global View: We view conjugate gradient method from the aspect of gradient descent. However, the descent method considers multiple directions simultaneously. Conjugate Gradient Formula: We state the formula of conjugate gradient. Conjugate Gradient Method Properties: We show that the global view of conjugate gradient method can be used to optimize each step independent of the other steps. Therefore, the process can repeat recursively and converge after n iterations, where n is the number of variables. Finally, we show and prove the property that validates the formula. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

5 Introduction: Formulation The original problem is to solve a simultaenous linear equation, Ax = b, where matrix A is symmetric and positive definite. Calculating the inverse x = A 1 b can be complicated, e.g. n is huge. To avoid a direct solver, we formulate the problem with a quadratic convex objective function. Formulation minimize 1 2 xt Ax b T x, A S n ++ Solution: x = A 1 b. To avoid direct solvers, use Gradient Descent iteratively to find the answer. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

6 Steepest Descent Formula Given initial k = 0,x k = x 0. We descent one direction per iteration along the gradient of the objective function. Derive residual r k = f(x k ) = b Ax k Set x k+1 = x k +α k r k, where step size α k is derived analytically. Step size α k = argmin s 0 f(x k +sr k ), From f(x k+αr k ) α k = 0, we have α k = rt k r k r T k Ar k Therefore, we have x k+1 = x k + rt k r k rk TAr r k k Repeat the above steps with k = k +1 until the norm of r k is within tolerance. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

7 Steepest Descent Properties Formula: x k+1 = x k +α k r k = x k + rt k r k rk TAr r k k Objective function: f(x k ) f(x k +α k r k ) = (rt k r k) 2 2r T k Ar k Residual r k+1 = (I α k A)r k = (I (rt k r k) 2 A)r k Proof: r T k Ar k r k+1 = b Ax k+1 = b A(x k +α k r k ) = r k α k Ar k = (I α k A)r k Property of the next direction: r k+1 r k Proof: rk Tr k+1 = rk T(I (rt k r k) 2 A)r k = 0. r T k Ar k Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

8 Steepest Descent: Convergence We denote x = x +e, where x is the optimal solution and e is the error that we try to reduce. We try to decrease the residual so that e can be reduced. As r 0, e 0. r k = b Ax k = b Ax Ae k = Ae k Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

9 Gradient Descent: Preconditioning We want to reduce the residual r k = Ae k. Let e k = n i=1 ξ iv i, where v i are the eigenvectors of A, i = 1,2,...,n. Then, we have r k = Ae k = n i=1 λ iξ i v i, where λ i are the eigenvalues of A. Thus, the next residual becomes r k+1 = ( I rt k r k rk TAr k = n i=1 ) A r k n i=1 λ i ξ i v i + λ2 i ξ2 i n i=1 λ3 i ξ2 i n λ 2 iξ i v i. Suppose that all eigenvalues are equal, i.e. λ i = λ, i. We have r k+1 = λ n i=1 ξ i v i + λ2 n i=1 ξ2 i λ 3 n i=1 ξ2 i i=1 n λ 2 ξ i v i = 0 Prof. Therefore, Chung-Kuan Cheng the(uc convergence San Diego) CSE291:Topics accelerates, on Scientific if Computation we can precondition December matrix 1, 2015 A. 9 / 19 i=1

10 Gradient Descent: Preconditioning f(x) = Ax b = 0 Ax = b Preconditioning: To transform Ax = b into another system with more favorable properties for it to be iteratively solved With the preconditioner M, M 1 Ax = M 1 b (e.g. incomplete LU scaling) Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

11 Conjugate Gradient: Descent with Multiple Vectors For conjugate gradient, we consider multiple vectors V k = [v 0,v 1,...,v k ] in stage k. Let x k+1 = x k +V k y, where y = [y 1,y 2,...,y k ] T is a vector of parameters. We can write V k y = k i=1 y iv i. To minimize f(x k+1 ), the solution is y = (V T k AV k) 1 V T k r k. Therefore, x k+1 = x k +V k y = x k +V k (V T k AV k) 1 V T k r k. Proof: To minimize f(x k+1 ), we want y f(x k+1 ) = 0. We have { } 1 y f(x k+1 ) = y 2 (x k +V k y) T A(x k +V k y) b T (x k +V k y) = V T k AV ky +V T k Ax k V T k b = VT k AV ky V T k r k = 0 y = (V T k AV k) 1 V T k r k. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

12 Conjugate Gradient: Multiple Vector Optimization For the descent on multiple directions, we have the following properties. Function: Since y = (V T k AV k) 1 V T k r k, we have f(x k+1 ) = f(x k )+ 1 2 yt V T k AV ky +y T V T k (Ax b) = f(x k ) 1 2 rt k V k(v T k AV k) 1 V T k r k. Residual: r k+1 = b Ax k+1 = b A(x k +V k (Vk T AV k) 1 Vk T r k) = (I AV k (Vk T AV k) 1 Vk T )r k. Property A: r k+1 V k. The proof is independent of the choice of V k. Proof:Vk T r k+1 = Vk T (I AV k(vk T AV k) 1 Vk T )r k = (Vk T VT k )r k = 0 Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

13 Global Procedure in Matrix Form V k Through iterations, we want to increase the size of matrix V k = [v 0,v 1,...,v k ] to V k+1 by adding a new vector v k+1 at the last column for iteration k +1. Initial k = 0,v 0 = r 0 = b Ax 0. Repeat: Update x k+1 = x k +V k (V T k AV k) 1 V T k r k and r k+1 = b x k+1. Exit if the norm of r k+1 < tolerance. Derive v k+1 as a function of r k+1 and V k (to be described in CG formula). Construct V k+1 by appending v k+1 to the last column of V k. k = k +1. Property B (independent of the choice of v k ): According to the procedure, we have V T k r k = [0,...,0,v T k r k] T. Proof: From Property A, we have V T k 1 r k = 0, thus V T k r k = [0,...,0,v T k r k] T. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

14 Conjugate Gradient: Wish List We hope that V T AV = D = diagd i is a diagonal matrix. In this case, we call that the vectors v i in V are mutually conjugate with respect to matrix A. If V T AV = D = diagd i, we have d i = v T i Av i Therefore, we have x k+1 = x k +V k (V T k AV k) 1 V T k r k = x k +V k D 1 [0,...,0,v T k r k] T = x k +α k v k (Property B), where α k = vt k r k v T k Av k Hopefully, for the new matrix V k+1, the conjugate property remains to be true. Then, we can repeat the steps by increasing k = k +1. When k = n 1, we have r T n V n 1 = 0 (property A). The last residual r n = 0, since matrix V n 1 is full ranked. Thus, we have the solution x n = x. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

15 Conjugate Gradient Descent Formula Given x 0, we set initial: k = 0,v k = r k = b Ax 0. x k+1 = x k +α k v k, where α k = vt k r k vk TAv (= rt k r k k vk TAv ). k r k+1 = b Ax k+1 = b Ax k α k Av k = r k α k Av k. v k+1 = r k+1 +β k+1 v k, where β k+1 = 1 α k r T k+1 r k+1 v T k Av k = rt k+1 r k+1 rk Tr. k Repeat the iteration with k = k +1 until the residual is smaller than the tolerance. Lemma: v T k r k = r T k r k. Proof: From Property A, we have v T k r k = (r k +β k v k 1 ) T r k = r T k r k. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

16 Validation of the Properties Theorem: The solution x k+1 of the conjugate gradient formula is consistent with the global procedure, i.e. vectors v i produced by the formula are mutually conjugate. The consistence is based on the following three equalities. Property A: ri T v j = 0, i > j. Residuals: ri T r j = 0, i > j. Conjugates: vi T Av j = 0, i > j. Proof: We prove the three equalities by induction. For the case when index i = 1, we have Property A: r1 Tv 0 = 0 Residuals: r1 Tr 0 = 0 (r 0 = v 0 ) Conjugates: v T 1 Av 0 = (r 1 +β 1 v 0 ) T Av 0 = r T 1 Av 0 +β 1 v T 0 Av 0 = r1 T ( r 0 r 1 )+ 1 r1 Tr 1 α 0 α 0 v0 TAv v0 T Av 0 = 0 (r1 T v 0 = 0,r 0 = v 0 ) 0 Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

17 Validation of the Wish List Proof by induction (continue): Suppose that the statement is true up to index i = k. By assumption of the three equalities, the conjugate gradient formula is consistent with the global procedure up to x k+1 = x k +α k v k. When index is i = k +1, we have Property A: r T k+1 V k = 0 Residuals: r T k+1 r j = r T k+1 (v j β j v j 1 ) = 0, j < k Conjugates: Case j = k:v T k+1 Av k = (r k+1 +β k+1 v k ) T Av k = r T k+1 Av k +β k+1 v T k Av k = rk+1 T (r k r k+1 )+ 1 rk+1 T r k+1 α k α k vk TAv vk T Av k k = 0 (r T k+1 r k = 0). Case j < k:v T k+1 Av j = (r k+1 +β k+1 v k ) T Av j = r T k+1 Av j = r T k+1 (r j r j+1 α j ) = 0, j < k. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

18 Summary We view the conjugate gradient method as an extension from one-direction descent of steepest gradient method to multiple-direction descent. From the global procedure of the multiple vector search, we can derive the basic properties of the optimization. The optimization result shows that the inversion of V T AV is one main cause of the zig-zag winding of the steepest descent approach. The formula of conjugate gradient method transforms the product V T AV into a diagonal matrix and thus simplifies the optimization procedure. Consequently, we can achive the desired properties and the convergence of the solution. Acknowledgement: The note is scribed by YT Jerry Peng for class CSE291, Fall Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

19 References J.R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain, CMU Technical Report, Convex optimization, by S. Boyd and L. Vandenberghe, Cambridge University Press, Matrix computations, G.H. Golub and C.F. Van Loan, Johns Hopkins, Numerical Recipes: The Art of Scientific Computing, by W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Cambridge University Press, Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19

1 Conjugate gradients

1 Conjugate gradients Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms

More information

Some minimization problems

Some minimization problems Week 13: Wednesday, Nov 14 Some minimization problems Last time, we sketched the following two-step strategy for approximating the solution to linear systems via Krylov subspaces: 1. Build a sequence of

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax

More information

An Iterative Descent Method

An Iterative Descent Method Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)

More information

The Conjugate Gradient Method for Solving Linear Systems of Equations

The Conjugate Gradient Method for Solving Linear Systems of Equations The Conjugate Gradient Method for Solving Linear Systems of Equations Mike Rambo Mentor: Hans de Moor May 2016 Department of Mathematics, Saint Mary s College of California Contents 1 Introduction 2 2

More information

Iterative Methods for Smooth Objective Functions

Iterative Methods for Smooth Objective Functions Optimization Iterative Methods for Smooth Objective Functions Quadratic Objective Functions Stationary Iterative Methods (first/second order) Steepest Descent Method Landweber/Projected Landweber Methods

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for

More information

4.6 Iterative Solvers for Linear Systems

4.6 Iterative Solvers for Linear Systems 4.6 Iterative Solvers for Linear Systems Why use iterative methods? Virtually all direct methods for solving Ax = b require O(n 3 ) floating point operations. In practical applications the matrix A often

More information

Conjugate Gradients: Idea

Conjugate Gradients: Idea Overview Steepest Descent often takes steps in the same direction as earlier steps Wouldn t it be better every time we take a step to get it exactly right the first time? Again, in general we choose a

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes

More information

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009) Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential

More information

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3 CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3 Felix Kwok February 27, 2004 Written Problems 1. (Heath E3.10) Let B be an n n matrix, and assume that B is both

More information

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019 Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent

More information

Preconditioning Techniques Analysis for CG Method

Preconditioning Techniques Analysis for CG Method Preconditioning Techniques Analysis for CG Method Huaguang Song Department of Computer Science University of California, Davis hso@ucdavis.edu Abstract Matrix computation issue for solve linear system

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

the method of steepest descent

the method of steepest descent MATH 3511 Spring 2018 the method of steepest descent http://www.phys.uconn.edu/ rozman/courses/m3511_18s/ Last modified: February 6, 2018 Abstract The Steepest Descent is an iterative method for solving

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Gradient-based Methods Marc Toussaint U Stuttgart Gradient descent methods Plain gradient descent (with adaptive stepsize) Steepest descent (w.r.t. a known metric) Conjugate

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 10, 2011 T.M. Huang (NTNU) Conjugate Gradient Method October 10, 2011 1 / 36 Outline 1 Steepest

More information

Iterative methods for Linear System

Iterative methods for Linear System Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method

More information

FEM and sparse linear system solving

FEM and sparse linear system solving FEM & sparse linear system solving, Lecture 9, Nov 19, 2017 1/36 Lecture 9, Nov 17, 2017: Krylov space methods http://people.inf.ethz.ch/arbenz/fem17 Peter Arbenz Computer Science Department, ETH Zürich

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Course Notes: Week 4

Course Notes: Week 4 Course Notes: Week 4 Math 270C: Applied Numerical Linear Algebra 1 Lecture 9: Steepest Descent (4/18/11) The connection with Lanczos iteration and the CG was not originally known. CG was originally derived

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

This ensures that we walk downhill. For fixed λ not even this may be the case.

This ensures that we walk downhill. For fixed λ not even this may be the case. Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant

More information

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University Lecture 17 Methods for System of Linear Equations: Part 2 Songting Luo Department of Mathematics Iowa State University MATH 481 Numerical Methods for Differential Equations Songting Luo ( Department of

More information

From Stationary Methods to Krylov Subspaces

From Stationary Methods to Krylov Subspaces Week 6: Wednesday, Mar 7 From Stationary Methods to Krylov Subspaces Last time, we discussed stationary methods for the iterative solution of linear systems of equations, which can generally be written

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Krylov Space Solvers

Krylov Space Solvers Seminar for Applied Mathematics ETH Zurich International Symposium on Frontiers of Computational Science Nagoya, 12/13 Dec. 2005 Sparse Matrices Large sparse linear systems of equations or large sparse

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP. 7.3 The Jacobi and Gauss-Siedel Iterative Techniques Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP. 7.3 The Jacobi and Gauss-Siedel Iterative Techniques

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc. Lecture 11: CMSC 878R/AMSC698R Iterative Methods An introduction Outline Direct Solution of Linear Systems Inverse, LU decomposition, Cholesky, SVD, etc. Iterative methods for linear systems Why? Matrix

More information

The speed of Shor s R-algorithm

The speed of Shor s R-algorithm IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University

More information

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

7.2 Steepest Descent and Preconditioning

7.2 Steepest Descent and Preconditioning 7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

Algorithmen zur digitalen Bildverarbeitung I

Algorithmen zur digitalen Bildverarbeitung I ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK Lehrstuhl für Mustererkennung und Bildverarbeitung Prof. Dr.-Ing. Hans Burkhardt Georges-Köhler-Allee Geb. 05, Zi 01-09 D-79110 Freiburg Tel.

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

Introduction to Iterative Solvers of Linear Systems

Introduction to Iterative Solvers of Linear Systems Introduction to Iterative Solvers of Linear Systems SFB Training Event January 2012 Prof. Dr. Andreas Frommer Typeset by Lukas Krämer, Simon-Wolfgang Mages and Rudolf Rödl 1 Classes of Matrices and their

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Lecture # 20 The Preconditioned Conjugate Gradient Method

Lecture # 20 The Preconditioned Conjugate Gradient Method Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,

More information

PETROV-GALERKIN METHODS

PETROV-GALERKIN METHODS Chapter 7 PETROV-GALERKIN METHODS 7.1 Energy Norm Minimization 7.2 Residual Norm Minimization 7.3 General Projection Methods 7.1 Energy Norm Minimization Saad, Sections 5.3.1, 5.2.1a. 7.1.1 Methods based

More information

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico Lecture 11 Fast Linear Solvers: Iterative Methods J. Chaudhry Department of Mathematics and Statistics University of New Mexico J. Chaudhry (UNM) Math/CS 375 1 / 23 Summary: Complexity of Linear Solves

More information

The Conjugate Gradient Method

The Conjugate Gradient Method CHAPTER The Conjugate Gradient Method Exercise.: A-norm Let A = LL be a Cholesy factorization of A, i.e.l is lower triangular with positive diagonal elements. The A-norm then taes the form x A = p x T

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Tsung-Ming Huang. Matrix Computation, 2016, NTNU

Tsung-Ming Huang. Matrix Computation, 2016, NTNU Tsung-Ming Huang Matrix Computation, 2016, NTNU 1 Plan Gradient method Conjugate gradient method Preconditioner 2 Gradient method 3 Theorem Ax = b, A : s.p.d Definition A : symmetric positive definite

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21 U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Scribe: Anupam Last revised Lecture 21 1 Laplacian systems in nearly linear time Building upon the ideas introduced in the

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:

More information

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3 Convex functions The domain dom f of a functional f : R N R is the subset of R N where f is well-defined. A function(al) f is convex if dom f is a convex set, and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W TMA480 Solutions to recommended exercises in Chapter 3 of N&W Exercise 3. The steepest descent and Newtons method with the bactracing algorithm is implemented in rosenbroc_newton.m. With initial point

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Introduction to Scientific Computing

Introduction to Scientific Computing Introduction to Scientific Computing Benson Muite benson.muite@ut.ee http://kodu.ut.ee/ benson https://courses.cs.ut.ee/2018/isc/spring 26 March 2018 [Public Domain,https://commons.wikimedia.org/wiki/File1

More information

Total least squares. Gérard MEURANT. October, 2008

Total least squares. Gérard MEURANT. October, 2008 Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares

More information

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps. Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,

More information

Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice

Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice 1 Lecture Notes, HCI, 4.1.211 Chapter 2 Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector

More information

Gradient Method Based on Roots of A

Gradient Method Based on Roots of A Journal of Scientific Computing, Vol. 15, No. 4, 2000 Solving Ax Using a Modified Conjugate Gradient Method Based on Roots of A Paul F. Fischer 1 and Sigal Gottlieb 2 Received January 23, 2001; accepted

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007.

AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007. AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007 This unit: So far: A survey of iterative methods for solving linear

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent

More information

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computations ªaw University of Technology Institute of Mathematics and Computer Science Warsaw, October 7, 2006

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems Iterative Methods for Sparse Linear Systems Luca Bergamaschi e-mail: berga@dmsa.unipd.it - http://www.dmsa.unipd.it/ berga Department of Mathematical Methods and Models for Scientific Applications University

More information

PDE Solvers for Fluid Flow

PDE Solvers for Fluid Flow PDE Solvers for Fluid Flow issues and algorithms for the Streaming Supercomputer Eran Guendelman February 5, 2002 Topics Equations for incompressible fluid flow 3 model PDEs: Hyperbolic, Elliptic, Parabolic

More information

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric

More information

Nonlinear Optimization

Nonlinear Optimization Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS Instructor: Shmuel Friedland Department of Mathematics, Statistics and Computer Science email: friedlan@uic.edu Last update April 18, 2010 1 HOMEWORK ASSIGNMENT

More information

Lec10p1, ORF363/COS323

Lec10p1, ORF363/COS323 Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output

More information