the method of steepest descent

Size: px

Start display at page:

Download "the method of steepest descent"

Darrell Wilkins
6 years ago
Views:

1 MATH 3511 Spring 2018 the method of steepest descent rozman/courses/m3511_18s/ Last modified: February 6, 2018 Abstract The Steepest Descent is an iterative method for solving sparse systems of linear equations. The idea of quadratic forms is introduced and used to derive the method. The presentation follows Sec. 1 4 of the article An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by J. R. Shewchuk Introduction Steepest Descent SD is an elegant iterative method for solving large systems of linear equations. SD is effective for systems of the form Ax = b, 1 where x is an unknown vector, b is a known vector, and A is a known, square, symmetric, positive-definite matrix. These systems arise in many important settings, such as finite difference and finite element methods for solving partial differential equations, structural analysis, circuit analysis, and math homework. Iterative methods like SD are suited for use with sparse matrices. If A is dense, the best course of action is to use Gaussian elimination by factoring A and solving the equation by backsubstitution. The time spent factoring a dense A is roughly equivalent to the time spent solving the system iteratively; and once A is factored, the system can be backsolved quickly for multiple values of b. Compare this dense matrix with a sparse matrix of larger Page 1 of 5

2 size that fills the same amount of memory. The triangular factors of a sparse A usually have many more nonzero elements than A itself. Factoring may be impossible due to limited memory, and will be time-consuming as well; even the backsolving step may be slower than iterative solution. On the other hand, most iterative methods are memory-efficient and run quickly with sparse matrices. We use capital letters to denote matrices, lower case letters to denote vectors, and Greek letters to denote scalars. A is an n n matrix, and x and b are vectors that is, n 1 matrices. Written out fully, Eq. 1 is A 11 A A 1n x 1 b 1 A 21 A A 2n x 2 b =... A n1 A n2... A nn x n b n The inner product of two vectors is written x t y, and represents the following scalar sum 2 x t y n x i y i. 3 i=1 Note, that x t y = y t x. We say that the vectors x and y are orthogonal if x t y = 0. A matrix A is positive-definite if, for every nonzero vector x x t Ax > 0. 4 We will get a feeling for what positive-definiteness is about when we see how it affects the shape of quadratic forms. 2 The quadratic form A quadratic form is a scalar quadratic function of a vector with the form f x = x t Ax b t x + c, 5 where A is a matrix, x and b are vectors, and c is a scalar constant. We ll show shortly that if A is symmetric and positive-definite, f x is minimized by the solution to Ax = b. Page 2 of 5

3 The gradient of a quadratic form is defined to be f x x 1 f f x x x = f x x n The gradient is a vector that, for a given point x, points in the direction of greatest increase of f x. At the bottom of the paraboloid bowl, the gradient is zero. One can minimize f x by setting f x equal to zero. f x = Ax b. 7 3 The method of steepest descent In the method of Steepest Descent, we start at an arbitrary point x 0 and slide down to the bottom of the paraboloid. We take a series of steps x 1, x 2,..., until we are satisfied that we are close enough to the solution x. When we take a step, we choose the direction in which f decreases most quickly, which is the direction opposite f x i According to Eq. 7, this direction is f x i = b Axi. Let s introduce the following definitions. The residual r i = b Ax i 8 indicates how far we are from the correct value of b. It is easy to see that r i = Ae i, 9 and you should think of the residual as being the error transformed by A into the same space as b. More importantly, r i = f x i 10 and you should also think of the residual as the direction of steepest descent. Whenever you read residual, think direction of steepest descent. Suppose we start at x 0. Our first step, along the direction of steepest descent. In other words, we will choose a point x 1 = x 0 + αr 0 11 Page 3 of 5

4 The question is, how big a step should we take? α minimizes f when the derivative d dα f x 1 is equal to zero. By the chain rule, d dα f df x 1 dx 1 x 1 = dx 1 dα = f x i r0 12 To determine α, note that f x i = ri, and we have: r t 1 r 0 = 0, 13 b Ax1 r0 = 0, 14 b A x0 + αr 0 r0 = 0, 15 b Ax0 r0 α Ar 0 r0 = 0, 16 b Ax0 r0 = α Ar 0 r0, 17 r t 0 r 0 = αr t 0 Ar0, 18 α = rt 0 r 0 r t 0 Ar 0 19 Putting it all together, the method of Steepest Descent is: r i = b Ax i 20 α i = rt i r i r t i Ar i 21 x i+1 = x i + α i r i 22 The algorithm, as written above, requires two matrix-vector multiplications per iteration. The computational cost of Steepest Descent is dominated by matrix-vector products; fortunately, one can be eliminated. By premultiplying both sides of Eq. 22 by and adding b, we have r i+1 = r i + α i Ar i. 23 Although Eq. 20 is still needed to compute r 0, Eq. 23 can be used for every iteration thereafter. The product Ar, which occurs in both Eqs. 21 and 23, need only be computed Page 4 of 5

5 once. The disadvantage of using this recurrence is that the sequence defined by Eq. 21 is generated without any feedback from the value of x i, so that accumulation of floating point roundoff error may cause x io converge to some point near x. This effect can be avoided by periodically using Eq. 20o recompute the correct residua. 4 Canned algorithm When Steepest Descent reaches the minimum point, the residual becomes zero, and if Eq. 21 is evaluated an iteration later, a division by zero will result. It seems, then, that one must stop immediately when the residual is zero. Usually, however, one wishes to stop before convergence is complete. Because the error term is not available, it is customary to stop when the norm of the residual falls below a specified value; often, this value is some small fraction of the initial residual. Given the inputs A, b, a starting value x, a maximum number of iterations i max, and an error tolerance ɛ: i = 0 r = b Ax δ = r t r δ 0 = δ while i < i max and δ > ɛδ 0 q = Ar α = δ r t q x = x + αr if mod i,50 == 0 r = b Ax else r = r αq end δ = r t r i = i + 1 end Page 5 of 5

EECS 275 Matrix Computation

EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview