Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.
|
|
- Eunice Lucinda Harper
- 5 years ago
- Views:
Transcription
1 Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7,
2 Chapter 5 (Conjugate Gradient Methods) Notes on the Text Notes on the linear conjugate gradient method The objective function we seek to minimize is given by φ(x) = 1 2 xt Ax b T x. (1) This will be minimized by taking a step from the current best guess at the minimum, x k, in the conjugate directions, p k, to get a new best guess as x k+1 = x k + αp k. (2) Note that to find the value of α to use in the step given in Equation 2, we can select the value of α that minimizes φ(α) φ(x k + αp k ). To find this value we take the derivative of this expression with respect to α, set the resulting expression equal to zero, and solve for α. Since φ(x k + αp k ) can be written as φ(α) 1 2 (x k + αp k ) T A(x k + αp k ) b T (x k + αp k ) = 1 2 xt k Ax k + αx T k Ap k α2 p T k Ap k b T x k αb T p k = φ(x k ) α2 p T k Ap k + α(p T k Ax k p T k b) T = φ(x k ) α2 p T k Ap k + αr T k p k. Setting the derivative of this expression equal to zero gives p T k Ap kα + r T k p k = 0. From which when we solve for α we get the following for the conjugate direction stepsize: α = rt k p k p T k Ap, (3) k this is equation 5.6 in the book. Notationally we can add a subscript k to the variable α as in α k to denote that this is the step size taking in moving from x k to x k+1. Since the residual r is defined as r = Ax b and to get the the k +1st minimization estimate x k+1 from x k we use Equation 2. If we premultiply this expression by A and subtract b from both sides we get Ax k+1 b = Ax k b + α k Ap k, or in terms of the residuals we get the residual update equation: which is the books equation 5.9. r k+1 = r k + α k Ap k, (4)
3 The initial induction step in the expanding subspace minimization theorem I found it a bit hard to verify the truth of the initial inductive statement used in the proof of the expanding subspace minimization theorem presented in the book. In that proof the initial step requires that one verify r T 1 p 0 = 0. This can be shown as follows. The first residual r 1 is given by r 1 = φ(x 0 + α 0 p 0 ) = A(x 0 + α 0 p 0 ) b. Thus the inner product of r 1 with p 0 gives r T 1 p 0 = (Ax 0 + α 0 Ap 0 b) T p 0 = x T 0 Ap 0 + α 0 p T 0 Ap 0 b T p 0. If we put in the value of α 0 = rt 0 p 0 p T 0 Ap 0 the above expression becomes as we were to show. = x T 0 Ap 0 rt 0 p 0 p T 0 Ap 0 p T 0 Ap 0 b T p 0 = (x T 0 A bt )p 0 r T 0 p 0 = r T 0 p r T 0 p 0 = 0, Notes on the basic properties of the conjugate gradient method We want to pick a value for β k such that the new expression for p k given by p k = r k + β k p k 1, to be A conjugate to the old p k 1. To enforce this conjugacy, multiply this expression by p T k 1A on the left of the expression above where we get p T k 1Ap k = p T k 1Ar k + β k p T k 1Ap k 1. If we take the left-hand-side of this expression equal to zero and solve for β k we get that β k = pt k 1Ar k p T k 1 Ap. (5) k 1 In this case the new value of p k will be A conjugate to the old value p k 1. Notes on the preliminary version of the conjugate gradient method The stepsize in the conjugate direction is given by Equation 3. If we use the conjugate update equation p k = r k + β k 1 p k 1, (6)
4 and the residual prior-conjugate orthogonality condition given by r T k p j = 0 for 0 j < k, (7) in Equation 6 we get when we multiply by r T k on the left we have r T k p k = r T k r k + β k 1 r T k p k 1 = r T k r k, Thus using this fact in Equation 3 we have an alternative expression for α k given by α k = rt k r k p T k Ap. (8) k In the preliminary conjugate gradient algorithm the stepsize, β k+1, in the conjugate direction is given by Equation 5. Using the residual update equation r k+1 = r k + α k Ap k, to replace Ap k in the expression for β k+1 to get β k+1 = 1 α k (rk+1 T (r k+1 r k )) 1 α k p T k (r k+1 r k ) since r T k+1 r k = 0, by residual-residual orthogonality = rt k+1 r k+1 p T k (r k+1 r k ), r T k r i = 0 for i = 0, 1,..., k 1. (9) Using the conjugate update equation p k = r k +β k+1 p k 1 in the denominator above, we get a new denominator given by ( r k + β k p k 1 ) T (r k+1 r k ). Next using residual prior-conjugate orthogonality Equation 7 or r T k p i = 0 for i = 0, 1,..., k 1, we have as we wanted to show. β k+1 = rt k+1 r k+1 r T k r k, (10) Notes on the rate of convergence of the conjugate gradient method To study the convergence of the conjugate gradient method we first argue that the minimization problem we originally posed: that of minimizing φ(x) = 1 2 xt Ax b T x over x is equivalent to the problem of minimizing a norm squared expression, namely x x 2 A. If we take the minimum of φ(x) to be denoted as x such that x solves Ax = b we then have the minimum of φ(x) at this point is given by φ(x ) = 1 2 x T Ax b T x = 1 2 x T b b T x = 1 2 bt x.
5 To show that these two minimization problems are equivalent consider the expression 1 2 x x 2 A. We have 1 2 x x 2 A = 1 2 (x x ) T A(x x ) = 1 2 xt Ax x T Ax x T Ax. Since Ax = b we have that the above becomes 1 2 x x 2 A = 1 2 xt Ax x T b x T Ax = φ(x) x T Ax = φ(x) 1 2 x T Ax + x T Ax verifying the books equation = φ(x) 1 2 x T Ax + x T b = φ(x) φ(x ), To study convergence of the conjugate gradient method we will decompose the difference between our initial guess at the solution denoted as x 0 and the true solution denoted by x or x 0 x in terms of the eigenvectors v i of A as x 0 x = n i=1 ξ i v i. When we do this we have that the difference between the k + 1th iteration and x is given by n x k+1 x = (1 + λ i Pk (λ i))ξ i v i. i=1 Then in terms of the A norm this distance is given by ( n ) x k+1 x T ( A = (1 + λ i Pk (λ n i)ξ i v i A i=1 ( n ) ( = (1 + λ i Pk (λ i)ξ i vi T n A i=1 n n = i=1 j=1 ) (1 + λ i Pk (λ i)ξ i v i i=1 ) (1 + λ i Pk (λ i)ξ i v i i=1 ξ i (1 + λ i P k (λ i))ξ j (1 + λ j P k (λ j))v T i Av j. As v j are orthonormal eigenvectors of A we have Av j = λ j v j and vi Tv j = 0 so that the above becomes n ξi 2 (1 + λ i Pk (λ i )) 2 λ i, i=1 as claimed by the book s equation Notes on the Polak-Ribiere Method In this subsection of these notes we derive an alternative expression for β k+1 that is used in the conjugate-direction update Equation 6 and that is valid for nonlinear optimization
6 problems. Since our state update equation is given by x k+1 = x k + α k p k, the gradient of f(x) at the new point x k+1 can be computed to second order using Taylor s theorem as f k+1 = f k + α k Ḡ k p k, where Ḡk is the average Hessian over the line segment [x k, x k+1 ]. To be sure that the new conjugate search direction p k+1 derived from the standard conjugate direction update equation: p k+1 = f k+1 + β k+1 p k, is conjugate with respect to the average Hessian Ḡk means that or using the expression for p k+1 this becomes So this later expression requires that p T k+1ḡkp k = 0, f T k+1ḡkp k + β k+1 p T k Ḡkp k = 0. β k+1 = ft k+1ḡkp k p T k Ḡkp k. Recognizing that Ḡkp k = f k+1 f k α k so β k+1 above becomes or the Hestenes-Stiefel formula and the books equation β k+1 = ft k+1 ( f k+1 f k ) ( f k+1 f k ) T p k, (11) Problem Solutions Problem 5.1 (the conjugate gradient algorithm on Hilbert matrices) See the MATLAB code prob 1.m where we call the MATLAB routine cgsolve.m to solve for the solution to Ax = b when A is the n n Hilbert matrix (generated in MATLAB using the built-in function hilb.m) and b is a vector of all ones and starting with an initial guess at the minimum of x 0 = 0. We do this for the values of n suggested in the text and then plot the number of CG iterations needed to drive the residual to The result of this calculation is presented in Figure 1. The definition of convergence here is taken to be when Ax b b < We see that the number of iterations grows relatively quickly with the dimension of the matrix A.
7 70 number of iterations required to reduced the error to 1.e size of Hilbert matrix (n) Figure 1: The number of iterations needed for convergence of the CG algorithm when applied to the (classically ill-conditioned) Hilbert matrix. Problem 5.2 (if p i are conjugate w.r.t. A then p i are linearly independent) The books equation 5.4 is the statement that p T i Ap j = 0 for all i j. To show that the vectors p i are linearly independent we begin by assuming that they are not and show that this leads to a contradiction. That p i are not linearly independent means that we can find constants α i (not all zero) such that αi p i = 0. If we premultiply the above summation by the matrix A we get αi Ap i = 0. Next premultiply the above by p T j to get α j p T j Ap j = 0, since p T j Ap i = 0 for all i j. Since A is positive definite the term p T j Ap j > 0 we can divide by it and conclude that α j = 0. Since the above is true for each value of j we have that each value of α j is zero. This is a contradiction to the assumption that p i are not linearly independent thus they must be linearly independent. Problem 5.3 (verification of the conjugate direction stepsize α k ) See the discussion around Equation 3 in these notes where the requested expression is derived.
8 Problem 5.4 (strongly convex when we step along the conjugate directions p k ) I think this problem is supposed to state that f(x) is strongly convex (as which means that f(x) has a functional form given by f(x) = 1 2 xt Ax b T x. In such a case when we take x of the form x = x 0 + σ 0 p 0 + σ 1 p σ k 1 p k 1, we can write f as a strongly convex function in the variables σ = (σ 1, σ 2,...,σ k 1 ) T. Problem 5.5 (the conjugate directions p i span the Krylov subspace) We want to show that 5.16 and 5.17 hold for k = 1. The books equation 5.16 is span{r 0, r 1 } = span{r 0, Ar 0 }. To show this equivalence we need to show that r 1 span{r 0, Ar 0 }. Now r 1 can be written as r 1 = Ax 1 b with x 1 given by = A(x 0 + α 0 p 0 ) b or = Ax 0 b + α 0 Ap 0 or since r 0 = Ax 0 b = r 0 + α 0 Ap 0 or since p 0 = r 0 = r 0 α 0 Ar 0, showing that r 1 span{r 0, Ar 0 }, and thus span{r 0, r 1 } span{r 0, Ar 0 }. Showing the other direction, that is span{r 0, Ar 0 } span{r 0, r 1 } is the same as noting that we can perform the manipulations above in the other direction. The books equation 5.17 when k = 1 is span{p 0, p 1 } = span{r 0, Ar 0 }, since p 0 = r 0 to show span{p 0, p 1 } span{r 0, Ar 0 } we need to show that p 1 span{r 0, Ar 0 }. Since p 1 = r 1 + β 1 p 0 = (Ax 1 b) β 1 r 0 = (A(x 0 + α 0 p 0 ) b) β 1 r 0 ) = r 0 + α 0 Ar 0 β 1 r 0. Again showing the other direction is the same as noting that we can perform the manipulations above in the other direction. Problem 5.6 (an alternative form for the conjugate direction stepsize β k+1 ) See the discussion around Equation 10 of these notes where the requested expression is derived.
9 Problem 5.7 (the eigensystem of a polynomial expression of a matrix) Given the eigenvalues λ i and eigenvectors v i of a matrix A then any polynomial expression of A say P(A) has the same eigenvectors with corresponding eigenvalues P(λ i ) as can be seen by simply evaluating P(A)v i. This problem is a special case of that result. Problem 5.9 (deriving the preconditioned CG algorithm) For this problem we are to derive the preconditioned CG algorithm from the normal CG algorithm. This means that we transform the original problem, that of seeing a solution for the minimum φ(x) of φ(x) = 1 2 xt Ax b T x, by transforming the original x variable into a hat variable ˆx = Cx. In this new space the x minimization problem above is equivalent to seeking the minimum solution to the following ˆφ(ˆx) = 1 2 (C 1ˆx) T A(C 1ˆx) b T (C 1ˆx) = 1 2ˆxT (C T AC 1 )ˆx (C T b) T. This later problem we will solve with the CG method where in the standard CG algorithm we take a matrix A and the vector b given by  = C T AC 1 ˆb = C T b. Now given ˆx 0 as an initial guess at the minimum of the hat problem (note that specifying this is equivalent to specifying a initial guess x 0 for the minimum of φ(x)) and following the standard CG algorithm but using the hated variables, we start to derive the preconditioned CG by setting ˆr 0 = C T AC 1ˆx 0 C T b ˆp 0 = ˆr 0 k = 0. With these initial variables set, we then loop while ˆr k 0 and perform the following steps (following algorithm 5.2) ˆα k = ˆx k+1 ˆr k+1 ˆr T k ˆr k ˆp T k C T AC 1ˆp k = ˆx k + ˆα kˆp k = ˆr k + ˆα k C T AC 1ˆp k ˆβ k+1 = ˆr k+1ˆr k+1 ˆr T k ˆr k ˆp k+1 = ˆr k+1 + ˆβ k+1ˆp k k = k + 1.
10 After performing these iterations our output will be ˆx or the minimum of the quadratic ˆφ(ˆx) = 1 2ˆxT (C T AC 1 )ˆx (C T b)ˆx, but we really want to output x = C 1ˆx. To derive an expression that works on x lets write the above algorithm in terms of the unhatted variables x and r. Given an initial guess x 0 at the minimum of φ(x) then ˆx 0 = Cx 0 is the initial guess at the minimum of ˆφ(ˆx). Note that ˆr 0 = C T AC 1ˆx 0 C T b, or C T ˆr 0 = Ax 0 b = r 0, is the residual of the original problem. Thus it looks like the residuals transform between hatted an unhatted variables as ˆr k = C T r k. (12) Next note that ˆp 0 = ˆr 0 = C T r 0 = C T p 0, it looks like the conjugate directions transform between hatted an unhatted variables in the same way, namely ˆp k = C T p k. (13) Using these two simplifications our preconditioned conjugate gradient algorithm becomes in terms of the unhatted variables (recall the unknown variable transforms as ˆx k = Cx k ) ˆα k = = rk TC 1 C T r k p T k C 1 C T AC 1 C T p k rk T (C T C) 1 r k (14) p T k (CT C) 1 A(C T C) 1 p k x k+1 r k+1 = x k + ˆα k C 1ˆp k = x k + ˆα k C 1 C T p k = x k + ˆα k (C T C) 1 p k (15) = r k + ˆα k AC 1 C T p k = r k + ˆα k A(C T C) 1 p k (16) ˆβ k+1 = rt k+1 C 1 C T r k+1 r T k C 1 C T r k = rt k+1 (CT C) 1 r k+1 r T k (CT C) 1 r k (17) p k+1 = r k+1 + ˆβ k+1 p k (18) k = k + 1. In the above expressions on each line, we first made the hat to unhat substitution and then on the subsequent line simplified the resulting expression. Next to simplify these expressions further we introduce two new variables. The first variable, y k, is defined by y k = (C T C) 1 r k,
11 or the solution y k to the linear system My k = r k, where M = C T C. The second variable z k is defined similarly as z k = (C T C) 1 p k, or the solution to the linear system Mz k = p k. To use these variables, as a first step, in Equation 18 above we multiply by (C T C) 1 on both sides and use the definitions of z k and y k to get z k+1 = y k+1 + ˆβ k+1 z k. our algorithm then becomes Given x 0, our initial guess at the minimum of φ(x) form r 0 = Ax 0 b and solve My 0 = r 0 for y 0. Next compute z 0 given by z 0 = M 1 p 0 = M 1 ( r 0 ) = M 1 (My 0 ) = y 0. (19) Next we set k = 0 and iterate the following while r k 0 ˆα k = rt k y k zk TAz k = x k + ˆα k z k x k+1 r k+1 = r k + ˆα k Az k solve My k+1 = r k+1 for y k+1 ˆβ k+1 = rt k+1 y k+1 r T k y k z k+1 = y k+1 + ˆβ k+1 z k k = k + 1. Note that this is the same algorithm presented in the book but the book denotes the variable z k by the notation p k and I think that there is an error in the book s initialization of this routine in that the book states p 0 = r 0 while I think this expression should be p 0 = y 0 (in their notation) see Equation 19. Problem 5.10 (deriving the modified residual conjugacy condition) In the transformed hat problem to minimize the quadratic ˆφ(ˆx) given by see Problem 5.9 on page 9 above, if we define ˆφ(ˆx) = 1 2ˆxT (C T AC 1 )ˆx (C Tˆb)T ˆx,  C T AC 1 ˆb C T b. So that the transformed hat problem has a residual ˆr given by ˆr = ˆx ˆb = C T AC 1 Cx C T b = C T (Ax b) = C T r.
12 Thus the orthogonality property of successive residuals i.e. the books equation 5.15 for the hat problem which is given by ˆr T k ˆr i = 0 for i = 0, 1, 2,, k 1. (20) becomes in terms of the original variables of the problem r T k C 1 C T r j = r T k M 1 r j = 0, since M = C T C or the modified residual conjugacy condition and is what we were to show. Problem 5.11 (the expressions for β PR and β HS reduce to β FR ) Recall that three expressions suggested for β (the conjugate direction stepsize) are for the Polak-Ribiere formula, for the Hestenes-Stiefel and for the Fletcher-Reeves expression. βk+1 PR = ft k+1 ( f k+1 f k ) fk T f, (21) k β HS k+1 = ft k+1 ( f k+1 f k ) ( f k+1 f k ) T p k, (22) βk+1 FR = ft k+1 f k+1 fk T f, (23) k To show that the Polak-Ribere CG stepsize β PR reduces to the Fletcher Reeves CG stepsize β FR, under the conditions given in this problem, from the above formulas it is sufficient to show that f T k+1 f k = 0, since they agree on the other terms. Now when f(x) is a quadratic function f(x) = 1 2 xt Ax b T x + c for some matrix A, vector b, and scalar c and x k+1 is chosen to be the exact line search minimum then f k+1 = r k+1. Thus f T k+1 f k = r T k+1r k = 0, by residual-residual orthogonality Equation 9 (the books equation 5.15). Now to show that Hestenes-Stiefel CG stepsize β HS reduces to the Fletcher Reeves CG stepsize β FR, under the conditions given in this problem, from the above formulas it is sufficient to show that ( f k+1 f k ) T p k = f T k f k. Using the results above we have ( f k+1 f k ) T p k = (r k+1 r k ) T ( r k + β k p k 1 ) = r T k+1 r k + β k r T k+1 p k 1 + r T k r k β k r T k p k 1 = r T k r k = f T k f k. Where in the above we have used residual prior-conjugate orthogonality given by Equation 7 to show r T k p k 1 = 0.
13 References
1 Conjugate gradients
Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration
More informationChapter 10 Conjugate Direction Methods
Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method
More informationNotes on Some Methods for Solving Linear Systems
Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationConjugate Gradient algorithm. Storage: fixed, independent of number of steps.
Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,
More informationJanuary 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13
Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview
More informationConjugate Gradient Method
Conjugate Gradient Method Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 10, 2011 T.M. Huang (NTNU) Conjugate Gradient Method October 10, 2011 1 / 36 Outline 1 Steepest
More informationCourse Notes: Week 4
Course Notes: Week 4 Math 270C: Applied Numerical Linear Algebra 1 Lecture 9: Steepest Descent (4/18/11) The connection with Lanczos iteration and the CG was not originally known. CG was originally derived
More information7.2 Steepest Descent and Preconditioning
7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider
More informationApplied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give
More informationSome minimization problems
Week 13: Wednesday, Nov 14 Some minimization problems Last time, we sketched the following two-step strategy for approximating the solution to linear systems via Krylov subspaces: 1. Build a sequence of
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationLinear Algebra- Final Exam Review
Linear Algebra- Final Exam Review. Let A be invertible. Show that, if v, v, v 3 are linearly independent vectors, so are Av, Av, Av 3. NOTE: It should be clear from your answer that you know the definition.
More informationITERATIVE METHODS BASED ON KRYLOV SUBSPACES
ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method
More informationConjugate Gradient Method
Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three
More informationPETROV-GALERKIN METHODS
Chapter 7 PETROV-GALERKIN METHODS 7.1 Energy Norm Minimization 7.2 Residual Norm Minimization 7.3 General Projection Methods 7.1 Energy Norm Minimization Saad, Sections 5.3.1, 5.2.1a. 7.1.1 Methods based
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationThe amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.
AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationGradient-Based Optimization
Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationEECS260 Optimization Lecture notes
EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction
More informationCS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3
CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3 Felix Kwok February 27, 2004 Written Problems 1. (Heath E3.10) Let B be an n n matrix, and assume that B is both
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationEfficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method
Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Zdeněk Strakoš and Petr Tichý Institute of Computer Science AS CR, Technical University of Berlin. International
More informationConjugate Gradient Tutorial
Conjugate Gradient Tutorial Prof. Chung-Kuan Cheng Computer Science and Engineering Department University of California, San Diego ckcheng@ucsd.edu December 1, 2015 Prof. Chung-Kuan Cheng (UC San Diego)
More informationSolutions to Review Problems for Chapter 6 ( ), 7.1
Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationM.A. Botchev. September 5, 2014
Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev
More information4.6 Iterative Solvers for Linear Systems
4.6 Iterative Solvers for Linear Systems Why use iterative methods? Virtually all direct methods for solving Ax = b require O(n 3 ) floating point operations. In practical applications the matrix A often
More informationEfficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method
Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Zdeněk Strakoš and Petr Tichý Institute of Computer Science AS CR, Technical University of Berlin. Emory
More informationIterative Methods for Linear Systems of Equations
Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationConjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)
Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps
More informationGradient Descent Methods
Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along
More informationLecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University
Lecture 17 Methods for System of Linear Equations: Part 2 Songting Luo Department of Mathematics Iowa State University MATH 481 Numerical Methods for Differential Equations Songting Luo ( Department of
More informationTMA4180 Solutions to recommended exercises in Chapter 3 of N&W
TMA480 Solutions to recommended exercises in Chapter 3 of N&W Exercise 3. The steepest descent and Newtons method with the bactracing algorithm is implemented in rosenbroc_newton.m. With initial point
More informationFrom Stationary Methods to Krylov Subspaces
Week 6: Wednesday, Mar 7 From Stationary Methods to Krylov Subspaces Last time, we discussed stationary methods for the iterative solution of linear systems of equations, which can generally be written
More informationTopics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems
Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD
More informationThe Conjugate Gradient Method for Solving Linear Systems of Equations
The Conjugate Gradient Method for Solving Linear Systems of Equations Mike Rambo Mentor: Hans de Moor May 2016 Department of Mathematics, Saint Mary s College of California Contents 1 Introduction 2 2
More informationMathematical optimization
Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationFALL 2018 MATH 4211/6211 Optimization Homework 4
FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution
More informationIntroduction to Iterative Solvers of Linear Systems
Introduction to Iterative Solvers of Linear Systems SFB Training Event January 2012 Prof. Dr. Andreas Frommer Typeset by Lukas Krämer, Simon-Wolfgang Mages and Rudolf Rödl 1 Classes of Matrices and their
More informationSPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS
SPRING 006 PRELIMINARY EXAMINATION SOLUTIONS 1A. Let G be the subgroup of the free abelian group Z 4 consisting of all integer vectors (x, y, z, w) such that x + 3y + 5z + 7w = 0. (a) Determine a linearly
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationLecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico
Lecture 11 Fast Linear Solvers: Iterative Methods J. Chaudhry Department of Mathematics and Statistics University of New Mexico J. Chaudhry (UNM) Math/CS 375 1 / 23 Summary: Complexity of Linear Solves
More informationLecture # 20 The Preconditioned Conjugate Gradient Method
Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More informationChapter 6: Orthogonality
Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products
More informationApplied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationA geometric proof of the spectral theorem for real symmetric matrices
0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationThe Lanczos and conjugate gradient algorithms
The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization
More informationLecture 10: October 27, 2016
Mathematical Toolkit Autumn 206 Lecturer: Madhur Tulsiani Lecture 0: October 27, 206 The conjugate gradient method In the last lecture we saw the steepest descent or gradient descent method for finding
More informationThe Conjugate Gradient Algorithm
Optimization over a Subspace Conjugate Direction Methods Conjugate Gradient Algorithm Non-Quadratic Conjugate Gradient Algorithm Optimization over a Subspace Consider the problem min f (x) subject to x
More informationMay 9, 2014 MATH 408 MIDTERM EXAM OUTLINE. Sample Questions
May 9, 24 MATH 48 MIDTERM EXAM OUTLINE This exam will consist of two parts and each part will have multipart questions. Each of the 6 questions is worth 5 points for a total of points. The two part of
More informationCombination Preconditioning of saddle-point systems for positive definiteness
Combination Preconditioning of saddle-point systems for positive definiteness Andy Wathen Oxford University, UK joint work with Jen Pestana Eindhoven, 2012 p.1/30 compute iterates with residuals Krylov
More informationDIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix
DIAGONALIZATION Definition We say that a matrix A of size n n is diagonalizable if there is a basis of R n consisting of eigenvectors of A ie if there are n linearly independent vectors v v n such that
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationOn the choice of abstract projection vectors for second level preconditioners
On the choice of abstract projection vectors for second level preconditioners C. Vuik 1, J.M. Tang 1, and R. Nabben 2 1 Delft University of Technology 2 Technische Universität Berlin Institut für Mathematik
More informationContribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa
Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computations ªaw University of Technology Institute of Mathematics and Computer Science Warsaw, October 7, 2006
More information6.4 Krylov Subspaces and Conjugate Gradients
6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P
More informationConjugate Directions for Stochastic Gradient Descent
Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationAlgorithms that use the Arnoldi Basis
AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Arnoldi Methods Dianne P. O Leary c 2006, 2007 Algorithms that use the Arnoldi Basis Reference: Chapter 6 of Saad The Arnoldi Basis How to
More informationAlgebra C Numerical Linear Algebra Sample Exam Problems
Algebra C Numerical Linear Algebra Sample Exam Problems Notation. Denote by V a finite-dimensional Hilbert space with inner product (, ) and corresponding norm. The abbreviation SPD is used for symmetric
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationThere are two things that are particularly nice about the first basis
Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially
More informationTHE RELATIONSHIPS BETWEEN CG, BFGS, AND TWO LIMITED-MEMORY ALGORITHMS
Furman University Electronic Journal of Undergraduate Mathematics Volume 12, 5 20, 2007 HE RELAIONSHIPS BEWEEN CG, BFGS, AND WO LIMIED-MEMORY ALGORIHMS ZHIWEI (ONY) QIN Abstract. For the solution of linear
More informationLecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.
Lecture 11: CMSC 878R/AMSC698R Iterative Methods An introduction Outline Direct Solution of Linear Systems Inverse, LU decomposition, Cholesky, SVD, etc. Iterative methods for linear systems Why? Matrix
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationCHAPTER 6. Projection Methods. Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L.
Projection Methods CHAPTER 6 Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L. V (n m) = [v, v 2,..., v m ] basis of K W (n m) = [w, w 2,..., w m ] basis of L Let x 0
More informationConjugate Gradient Method
Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for
More informationNumerical Methods for Large-Scale Nonlinear Systems
Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.
More informationSteepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples
Conjugate gradient training algorithm Steepest descent algorithm So far: Heuristic improvements to gradient descent (momentum Steepest descent training algorithm Can we do better? Definitions: weight vector
More information5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.
Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 21: Sensitivity of Eigenvalues and Eigenvectors; Conjugate Gradient Method Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis
More informationSolutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015
Solutions to Final Practice Problems Written by Victoria Kala vtkala@math.ucsb.edu Last updated /5/05 Answers This page contains answers only. See the following pages for detailed solutions. (. (a x. See
More informationMA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS
MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is
More informationThe Method of Conjugate Directions 21
The Method of Conjugate Directions 1 + 1 i : 7. The Method of Conjugate Directions 7.1. Conjugacy Steepest Descent often finds itself taking steps in the same direction as earlier steps (see Figure 8).
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationMath 409/509 (Spring 2011)
Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate
More informationThe geometry of least squares
The geometry of least squares We can think of a vector as a point in space, where the elements of the vector are the coordinates of the point. Consider for example, the following vector s: t = ( 4, 0),
More informationEcon Slides from Lecture 7
Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationSummary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method
Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Leslie Foster 11-5-2012 We will discuss the FOM (full orthogonalization method), CG,
More informationLecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm
CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationThe conjugate gradient method
The conjugate gradient method Michael S. Floater November 1, 2011 These notes try to provide motivation and an explanation of the CG method. 1 The method of conjugate directions We want to solve the linear
More information