Conjugate Gradient Tutorial
|
|
- Octavia Arnold
- 6 years ago
- Views:
Transcription
1 Conjugate Gradient Tutorial Prof. Chung-Kuan Cheng Computer Science and Engineering Department University of California, San Diego December 1, 2015 Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
2 Overview 1 Introduction Overview Formulation 2 Steepest Descent: Descent in One Vector Direction Steepest Descent Formula Steepest Descent Properties Steepest Descent Convergence Preconditioning 3 Conjugate Gradient: Descent with Multiple Vectors Multiple Vector Optimization Global Procedure in Matrix Form V k Conjugate Gradient: Wish List Conjugate Gradient Descent: Formula Validation of the Properties 4 Summary 5 References Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
3 Introduction: Overview Conjugate Gradient is an extension of steepest gradient descent. For steepest gradient, we step in one direction per iteration. Through the iterations, we found that the new directions may contain the component of the old directions and the process walks in zig-zag patterns. For conjugate gradient, we consider multiple directions simulteneously. Hence, we avoid to repeat the old directions. In 1952, Hestenes and Stiefel independently introduced conjugate gradient formula to simplify the multiple direction search. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
4 Introduction: Overview Steepest Gradient Descent: We derive the method and properties of the steepest descent method. We view the steepest descent method as an one-direction per iteration approach. The method suffers slow zig-zag winding in a narrow valley of equal potential terrain. Preconditioning: From the properties of the steepest descent method, we find that preconditioning improves the convergence rate. Conjugate Gradient in Global View: We view conjugate gradient method from the aspect of gradient descent. However, the descent method considers multiple directions simultaneously. Conjugate Gradient Formula: We state the formula of conjugate gradient. Conjugate Gradient Method Properties: We show that the global view of conjugate gradient method can be used to optimize each step independent of the other steps. Therefore, the process can repeat recursively and converge after n iterations, where n is the number of variables. Finally, we show and prove the property that validates the formula. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
5 Introduction: Formulation The original problem is to solve a simultaenous linear equation, Ax = b, where matrix A is symmetric and positive definite. Calculating the inverse x = A 1 b can be complicated, e.g. n is huge. To avoid a direct solver, we formulate the problem with a quadratic convex objective function. Formulation minimize 1 2 xt Ax b T x, A S n ++ Solution: x = A 1 b. To avoid direct solvers, use Gradient Descent iteratively to find the answer. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
6 Steepest Descent Formula Given initial k = 0,x k = x 0. We descent one direction per iteration along the gradient of the objective function. Derive residual r k = f(x k ) = b Ax k Set x k+1 = x k +α k r k, where step size α k is derived analytically. Step size α k = argmin s 0 f(x k +sr k ), From f(x k+αr k ) α k = 0, we have α k = rt k r k r T k Ar k Therefore, we have x k+1 = x k + rt k r k rk TAr r k k Repeat the above steps with k = k +1 until the norm of r k is within tolerance. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
7 Steepest Descent Properties Formula: x k+1 = x k +α k r k = x k + rt k r k rk TAr r k k Objective function: f(x k ) f(x k +α k r k ) = (rt k r k) 2 2r T k Ar k Residual r k+1 = (I α k A)r k = (I (rt k r k) 2 A)r k Proof: r T k Ar k r k+1 = b Ax k+1 = b A(x k +α k r k ) = r k α k Ar k = (I α k A)r k Property of the next direction: r k+1 r k Proof: rk Tr k+1 = rk T(I (rt k r k) 2 A)r k = 0. r T k Ar k Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
8 Steepest Descent: Convergence We denote x = x +e, where x is the optimal solution and e is the error that we try to reduce. We try to decrease the residual so that e can be reduced. As r 0, e 0. r k = b Ax k = b Ax Ae k = Ae k Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
9 Gradient Descent: Preconditioning We want to reduce the residual r k = Ae k. Let e k = n i=1 ξ iv i, where v i are the eigenvectors of A, i = 1,2,...,n. Then, we have r k = Ae k = n i=1 λ iξ i v i, where λ i are the eigenvalues of A. Thus, the next residual becomes r k+1 = ( I rt k r k rk TAr k = n i=1 ) A r k n i=1 λ i ξ i v i + λ2 i ξ2 i n i=1 λ3 i ξ2 i n λ 2 iξ i v i. Suppose that all eigenvalues are equal, i.e. λ i = λ, i. We have r k+1 = λ n i=1 ξ i v i + λ2 n i=1 ξ2 i λ 3 n i=1 ξ2 i i=1 n λ 2 ξ i v i = 0 Prof. Therefore, Chung-Kuan Cheng the(uc convergence San Diego) CSE291:Topics accelerates, on Scientific if Computation we can precondition December matrix 1, 2015 A. 9 / 19 i=1
10 Gradient Descent: Preconditioning f(x) = Ax b = 0 Ax = b Preconditioning: To transform Ax = b into another system with more favorable properties for it to be iteratively solved With the preconditioner M, M 1 Ax = M 1 b (e.g. incomplete LU scaling) Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
11 Conjugate Gradient: Descent with Multiple Vectors For conjugate gradient, we consider multiple vectors V k = [v 0,v 1,...,v k ] in stage k. Let x k+1 = x k +V k y, where y = [y 1,y 2,...,y k ] T is a vector of parameters. We can write V k y = k i=1 y iv i. To minimize f(x k+1 ), the solution is y = (V T k AV k) 1 V T k r k. Therefore, x k+1 = x k +V k y = x k +V k (V T k AV k) 1 V T k r k. Proof: To minimize f(x k+1 ), we want y f(x k+1 ) = 0. We have { } 1 y f(x k+1 ) = y 2 (x k +V k y) T A(x k +V k y) b T (x k +V k y) = V T k AV ky +V T k Ax k V T k b = VT k AV ky V T k r k = 0 y = (V T k AV k) 1 V T k r k. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
12 Conjugate Gradient: Multiple Vector Optimization For the descent on multiple directions, we have the following properties. Function: Since y = (V T k AV k) 1 V T k r k, we have f(x k+1 ) = f(x k )+ 1 2 yt V T k AV ky +y T V T k (Ax b) = f(x k ) 1 2 rt k V k(v T k AV k) 1 V T k r k. Residual: r k+1 = b Ax k+1 = b A(x k +V k (Vk T AV k) 1 Vk T r k) = (I AV k (Vk T AV k) 1 Vk T )r k. Property A: r k+1 V k. The proof is independent of the choice of V k. Proof:Vk T r k+1 = Vk T (I AV k(vk T AV k) 1 Vk T )r k = (Vk T VT k )r k = 0 Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
13 Global Procedure in Matrix Form V k Through iterations, we want to increase the size of matrix V k = [v 0,v 1,...,v k ] to V k+1 by adding a new vector v k+1 at the last column for iteration k +1. Initial k = 0,v 0 = r 0 = b Ax 0. Repeat: Update x k+1 = x k +V k (V T k AV k) 1 V T k r k and r k+1 = b x k+1. Exit if the norm of r k+1 < tolerance. Derive v k+1 as a function of r k+1 and V k (to be described in CG formula). Construct V k+1 by appending v k+1 to the last column of V k. k = k +1. Property B (independent of the choice of v k ): According to the procedure, we have V T k r k = [0,...,0,v T k r k] T. Proof: From Property A, we have V T k 1 r k = 0, thus V T k r k = [0,...,0,v T k r k] T. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
14 Conjugate Gradient: Wish List We hope that V T AV = D = diagd i is a diagonal matrix. In this case, we call that the vectors v i in V are mutually conjugate with respect to matrix A. If V T AV = D = diagd i, we have d i = v T i Av i Therefore, we have x k+1 = x k +V k (V T k AV k) 1 V T k r k = x k +V k D 1 [0,...,0,v T k r k] T = x k +α k v k (Property B), where α k = vt k r k v T k Av k Hopefully, for the new matrix V k+1, the conjugate property remains to be true. Then, we can repeat the steps by increasing k = k +1. When k = n 1, we have r T n V n 1 = 0 (property A). The last residual r n = 0, since matrix V n 1 is full ranked. Thus, we have the solution x n = x. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
15 Conjugate Gradient Descent Formula Given x 0, we set initial: k = 0,v k = r k = b Ax 0. x k+1 = x k +α k v k, where α k = vt k r k vk TAv (= rt k r k k vk TAv ). k r k+1 = b Ax k+1 = b Ax k α k Av k = r k α k Av k. v k+1 = r k+1 +β k+1 v k, where β k+1 = 1 α k r T k+1 r k+1 v T k Av k = rt k+1 r k+1 rk Tr. k Repeat the iteration with k = k +1 until the residual is smaller than the tolerance. Lemma: v T k r k = r T k r k. Proof: From Property A, we have v T k r k = (r k +β k v k 1 ) T r k = r T k r k. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
16 Validation of the Properties Theorem: The solution x k+1 of the conjugate gradient formula is consistent with the global procedure, i.e. vectors v i produced by the formula are mutually conjugate. The consistence is based on the following three equalities. Property A: ri T v j = 0, i > j. Residuals: ri T r j = 0, i > j. Conjugates: vi T Av j = 0, i > j. Proof: We prove the three equalities by induction. For the case when index i = 1, we have Property A: r1 Tv 0 = 0 Residuals: r1 Tr 0 = 0 (r 0 = v 0 ) Conjugates: v T 1 Av 0 = (r 1 +β 1 v 0 ) T Av 0 = r T 1 Av 0 +β 1 v T 0 Av 0 = r1 T ( r 0 r 1 )+ 1 r1 Tr 1 α 0 α 0 v0 TAv v0 T Av 0 = 0 (r1 T v 0 = 0,r 0 = v 0 ) 0 Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
17 Validation of the Wish List Proof by induction (continue): Suppose that the statement is true up to index i = k. By assumption of the three equalities, the conjugate gradient formula is consistent with the global procedure up to x k+1 = x k +α k v k. When index is i = k +1, we have Property A: r T k+1 V k = 0 Residuals: r T k+1 r j = r T k+1 (v j β j v j 1 ) = 0, j < k Conjugates: Case j = k:v T k+1 Av k = (r k+1 +β k+1 v k ) T Av k = r T k+1 Av k +β k+1 v T k Av k = rk+1 T (r k r k+1 )+ 1 rk+1 T r k+1 α k α k vk TAv vk T Av k k = 0 (r T k+1 r k = 0). Case j < k:v T k+1 Av j = (r k+1 +β k+1 v k ) T Av j = r T k+1 Av j = r T k+1 (r j r j+1 α j ) = 0, j < k. Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
18 Summary We view the conjugate gradient method as an extension from one-direction descent of steepest gradient method to multiple-direction descent. From the global procedure of the multiple vector search, we can derive the basic properties of the optimization. The optimization result shows that the inversion of V T AV is one main cause of the zig-zag winding of the steepest descent approach. The formula of conjugate gradient method transforms the product V T AV into a diagonal matrix and thus simplifies the optimization procedure. Consequently, we can achive the desired properties and the convergence of the solution. Acknowledgement: The note is scribed by YT Jerry Peng for class CSE291, Fall Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
19 References J.R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain, CMU Technical Report, Convex optimization, by S. Boyd and L. Vandenberghe, Cambridge University Press, Matrix computations, G.H. Golub and C.F. Van Loan, Johns Hopkins, Numerical Recipes: The Art of Scientific Computing, by W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Cambridge University Press, Prof. Chung-Kuan Cheng (UC San Diego) CSE291:Topics on Scientific Computation December 1, / 19
1 Conjugate gradients
Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview
More informationNotes on Some Methods for Solving Linear Systems
Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms
More informationSome minimization problems
Week 13: Wednesday, Nov 14 Some minimization problems Last time, we sketched the following two-step strategy for approximating the solution to linear systems via Krylov subspaces: 1. Build a sequence of
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax
More informationAn Iterative Descent Method
Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)
More informationThe Conjugate Gradient Method for Solving Linear Systems of Equations
The Conjugate Gradient Method for Solving Linear Systems of Equations Mike Rambo Mentor: Hans de Moor May 2016 Department of Mathematics, Saint Mary s College of California Contents 1 Introduction 2 2
More informationIterative Methods for Smooth Objective Functions
Optimization Iterative Methods for Smooth Objective Functions Quadratic Objective Functions Stationary Iterative Methods (first/second order) Steepest Descent Method Landweber/Projected Landweber Methods
More informationConjugate Gradient Method
Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for
More information4.6 Iterative Solvers for Linear Systems
4.6 Iterative Solvers for Linear Systems Why use iterative methods? Virtually all direct methods for solving Ax = b require O(n 3 ) floating point operations. In practical applications the matrix A often
More informationConjugate Gradients: Idea
Overview Steepest Descent often takes steps in the same direction as earlier steps Wouldn t it be better every time we take a step to get it exactly right the first time? Again, in general we choose a
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationSolutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.
Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes
More informationIterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)
Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential
More informationCS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3
CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3 Felix Kwok February 27, 2004 Written Problems 1. (Heath E3.10) Let B be an n n matrix, and assume that B is both
More informationMath 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019
Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent
More informationPreconditioning Techniques Analysis for CG Method
Preconditioning Techniques Analysis for CG Method Huaguang Song Department of Computer Science University of California, Davis hso@ucdavis.edu Abstract Matrix computation issue for solve linear system
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationM.A. Botchev. September 5, 2014
Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev
More informationNotes on PCG for Sparse Linear Systems
Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/
More informationthe method of steepest descent
MATH 3511 Spring 2018 the method of steepest descent http://www.phys.uconn.edu/ rozman/courses/m3511_18s/ Last modified: February 6, 2018 Abstract The Steepest Descent is an iterative method for solving
More informationIntroduction to Optimization
Introduction to Optimization Gradient-based Methods Marc Toussaint U Stuttgart Gradient descent methods Plain gradient descent (with adaptive stepsize) Steepest descent (w.r.t. a known metric) Conjugate
More informationConjugate Gradient Method
Conjugate Gradient Method Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 10, 2011 T.M. Huang (NTNU) Conjugate Gradient Method October 10, 2011 1 / 36 Outline 1 Steepest
More informationIterative methods for Linear System
Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationApplied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give
More informationITERATIVE METHODS BASED ON KRYLOV SUBSPACES
ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method
More informationFEM and sparse linear system solving
FEM & sparse linear system solving, Lecture 9, Nov 19, 2017 1/36 Lecture 9, Nov 17, 2017: Krylov space methods http://people.inf.ethz.ch/arbenz/fem17 Peter Arbenz Computer Science Department, ETH Zürich
More information6.4 Krylov Subspaces and Conjugate Gradients
6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P
More informationCourse Notes: Week 4
Course Notes: Week 4 Math 270C: Applied Numerical Linear Algebra 1 Lecture 9: Steepest Descent (4/18/11) The connection with Lanczos iteration and the CG was not originally known. CG was originally derived
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationThis ensures that we walk downhill. For fixed λ not even this may be the case.
Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant
More informationLecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University
Lecture 17 Methods for System of Linear Equations: Part 2 Songting Luo Department of Mathematics Iowa State University MATH 481 Numerical Methods for Differential Equations Songting Luo ( Department of
More informationFrom Stationary Methods to Krylov Subspaces
Week 6: Wednesday, Mar 7 From Stationary Methods to Krylov Subspaces Last time, we discussed stationary methods for the iterative solution of linear systems of equations, which can generally be written
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationKrylov Space Solvers
Seminar for Applied Mathematics ETH Zurich International Symposium on Frontiers of Computational Science Nagoya, 12/13 Dec. 2005 Sparse Matrices Large sparse linear systems of equations or large sparse
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More information7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.
7.3 The Jacobi and Gauss-Siedel Iterative Techniques Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP. 7.3 The Jacobi and Gauss-Siedel Iterative Techniques
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationOptimization Methods
Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationLecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.
Lecture 11: CMSC 878R/AMSC698R Iterative Methods An introduction Outline Direct Solution of Linear Systems Inverse, LU decomposition, Cholesky, SVD, etc. Iterative methods for linear systems Why? Matrix
More informationThe speed of Shor s R-algorithm
IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University
More informationLecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm
CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction
More informationConjugate Gradient Method
Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More information7.2 Steepest Descent and Preconditioning
7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider
More informationThe Lanczos and conjugate gradient algorithms
The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization
More informationAlgorithmen zur digitalen Bildverarbeitung I
ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG INSTITUT FÜR INFORMATIK Lehrstuhl für Mustererkennung und Bildverarbeitung Prof. Dr.-Ing. Hans Burkhardt Georges-Köhler-Allee Geb. 05, Zi 01-09 D-79110 Freiburg Tel.
More informationIterative Methods for Linear Systems of Equations
Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method
More informationJanuary 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13
Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes
More informationIntroduction to Iterative Solvers of Linear Systems
Introduction to Iterative Solvers of Linear Systems SFB Training Event January 2012 Prof. Dr. Andreas Frommer Typeset by Lukas Krämer, Simon-Wolfgang Mages and Rudolf Rödl 1 Classes of Matrices and their
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationLecture # 20 The Preconditioned Conjugate Gradient Method
Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,
More informationPETROV-GALERKIN METHODS
Chapter 7 PETROV-GALERKIN METHODS 7.1 Energy Norm Minimization 7.2 Residual Norm Minimization 7.3 General Projection Methods 7.1 Energy Norm Minimization Saad, Sections 5.3.1, 5.2.1a. 7.1.1 Methods based
More informationLecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico
Lecture 11 Fast Linear Solvers: Iterative Methods J. Chaudhry Department of Mathematics and Statistics University of New Mexico J. Chaudhry (UNM) Math/CS 375 1 / 23 Summary: Complexity of Linear Solves
More informationThe Conjugate Gradient Method
CHAPTER The Conjugate Gradient Method Exercise.: A-norm Let A = LL be a Cholesy factorization of A, i.e.l is lower triangular with positive diagonal elements. The A-norm then taes the form x A = p x T
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationTsung-Ming Huang. Matrix Computation, 2016, NTNU
Tsung-Ming Huang Matrix Computation, 2016, NTNU 1 Plan Gradient method Conjugate gradient method Preconditioner 2 Gradient method 3 Theorem Ax = b, A : s.p.d Definition A : symmetric positive definite
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationTopics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems
Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate
More informationU.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21
U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Scribe: Anupam Last revised Lecture 21 1 Laplacian systems in nearly linear time Building upon the ideas introduced in the
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationThe amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.
AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:
More informationA function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3
Convex functions The domain dom f of a functional f : R N R is the subset of R N where f is well-defined. A function(al) f is convex if dom f is a convex set, and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all
More informationConjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)
Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps
More informationTMA4180 Solutions to recommended exercises in Chapter 3 of N&W
TMA480 Solutions to recommended exercises in Chapter 3 of N&W Exercise 3. The steepest descent and Newtons method with the bactracing algorithm is implemented in rosenbroc_newton.m. With initial point
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationIntroduction to Scientific Computing
Introduction to Scientific Computing Benson Muite benson.muite@ut.ee http://kodu.ut.ee/ benson https://courses.cs.ut.ee/2018/isc/spring 26 March 2018 [Public Domain,https://commons.wikimedia.org/wiki/File1
More informationTotal least squares. Gérard MEURANT. October, 2008
Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares
More informationConjugate Gradient algorithm. Storage: fixed, independent of number of steps.
Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,
More informationGradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice
1 Lecture Notes, HCI, 4.1.211 Chapter 2 Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationLecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University
Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector
More informationGradient Method Based on Roots of A
Journal of Scientific Computing, Vol. 15, No. 4, 2000 Solving Ax Using a Modified Conjugate Gradient Method Based on Roots of A Paul F. Fischer 1 and Sigal Gottlieb 2 Received January 23, 2001; accepted
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationNumerical Methods - Numerical Linear Algebra
Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear
More informationAMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007.
AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007 This unit: So far: A survey of iterative methods for solving linear
More informationNumerical Optimization
Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent
More informationContribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa
Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computations ªaw University of Technology Institute of Mathematics and Computer Science Warsaw, October 7, 2006
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationIterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems Luca Bergamaschi e-mail: berga@dmsa.unipd.it - http://www.dmsa.unipd.it/ berga Department of Mathematical Methods and Models for Scientific Applications University
More informationPDE Solvers for Fluid Flow
PDE Solvers for Fluid Flow issues and algorithms for the Streaming Supercomputer Eran Guendelman February 5, 2002 Topics Equations for incompressible fluid flow 3 model PDEs: Hyperbolic, Elliptic, Parabolic
More informationSome definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts
Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric
More informationNonlinear Optimization
Nonlinear Optimization (Com S 477/577 Notes) Yan-Bin Jia Nov 7, 2017 1 Introduction Given a single function f that depends on one or more independent variable, we want to find the values of those variables
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationMATH 425-Spring 2010 HOMEWORK ASSIGNMENTS
MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS Instructor: Shmuel Friedland Department of Mathematics, Statistics and Computer Science email: friedlan@uic.edu Last update April 18, 2010 1 HOMEWORK ASSIGNMENT
More informationLec10p1, ORF363/COS323
Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output
More information