An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

Size: px

Start display at page:

Download "An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84"

Ashley Collins
5 years ago
Views:

1 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

2 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A x = b. (1) Many methods exist for solving this equation. Most discretization methods will impose some structure on A which can be exploited. In this talk we will examine a collection of methods which can efficiently solve this problem, as well as a software package that can help determine the best method to use. 2/84

3 Multigrid Methods Multigrid methods are accelerators for iterative solvers. Given a computational grid, an approximation to the solution is found. The problem is then restricted to a sub-grid, called the coarse grid. On the coarse grid the residual problem is solved. The solution to the residual problem is then interpolated back to the full mesh (called the fine grid) where the correction is made to the approximation of the solution. Multigrid methods are very effective. However, they require a detailed knowledge of the underlying computational mesh. 3/84

4 Algebraic Multigrid (AMG) Methods Algebraic Multigrid Methods differ from multigrid method only in the method of coarsening. While multigrid methods require knowledge of the mesh, AMG methods extract all the needed information from the system matrix. It is in this coarsening step that most AMG methods differ. 4/84

5 AMGLab 5/84

6 AMGLab AMGLab is a software package which runs in MATLAB. The goals for AMGLab: AMGLab is designed to be an easy to use tool with a gentle learning curve. AMGLab has a large collection of methods which are easily accessed and compared. AMGLab can run custom problems with minimal setup time. AMGLab is powerful enough to run moderately sized problems. Most of the complexity in an AMG method is in the coarsening phase. 6/84

7 Ruge Stüben 7/84

8 Ruge Stüben The Ruge Stüben method was introduced around The method works in two passes. The first pass will make a selection of coarse nodes based on the number of strong connections that each node has. The second pass is a refinement pass. It checks to make sure there are enough coarse nodes that information is not lost. 8/84

9 Ruge Stüben Let A = [a ij ] be a stiffness matrix which comes from a discretization of a PDE over some domain Ω. Definition Two nodes u i and u j are said to be connected if a ij 0. Given a threshold value 0 < θ 1, the node (point) u i strongly depends on (or is strongly connected to) u j if a ij θ max k i { a ik } (2) If u i is connected to u j, but not strongly connected, it is said to be weakly connected. For simplicity of notation, u i will be denoted as node i, or just i. 9/84

10 Ruge Stüben Definition The set of grid points selected to be part of the coarse grid will be called C. An element in C will be called a C-point. The set of points not selected to be in C will be called fine points and will belong to the set F. Points in F will be called F-points. Note that F and C will partition the grid. 10/84

11 Ruge Stüben Before the first pass, define an array λ where λ i is the number of strong connections to node i. First Pass Every grid point will be assigned to the coarse or fine sets: 1. Find the unassigned grid point i which has the largest λ i value. 2. Add i to the set of coarse points. Add every j which is strongly connected to i to the fine grid set. 3. For each node j in the last step, find every k strongly connected to j and increase λ k by one. 4. If every node is either in the fine set or coarse set, end. Else return to (1). 11/84

12 Ruge Stüben The second pass will check for any fine fine node connections which do not have a common coarse node. If any such fine nodes are found, one of the fine nodes will be made a coarse node. 12/84

13 Ruge Stüben Definition For each fine-grid point i, define N i, the neighborhood of i, to be the set of all points j i such that a ij 0. These points can be divided into three categories: The neighboring coarse-grid points that strongly influence i; this is the coarse interpolation set for i, denoted by C i ; The neighboring fine-grid points that strongly influence i, denoted by D s i ; The points that do not strongly influence i, denoted by D w i ; this set may contain both coarse- and fine-grid points; it is called the set of weakly connected neighbors. 13/84

14 Ruge Stüben To construct the prolongation operator, the weighting factors are needed. These will be constructed and stored in a matrix whose entries are given by, ω ij = a ij + a im a mj m Di s a mk k C i a ii + n D w i a in. (3) 14/84

15 Ruge Stüben The prolongation operator is constructed row wise. If i is a coarse node, then the ith row will be the identity. If i is a fine node, then the corresponding row of the matrix of weights is used. 15/84

16 Beck Algorithm 16/84

17 Beck The Beck algorithm was introduced in 1999 by Rudolf Beck. The algorithm is a simple coarsening strategy related to to the Ruge Stüben method. This method is very simple to implement, and works on many of the same problems as more complex methods, and with similar results. The idea is to look at the graph of the stiffness matrix rather then at the strength of the connections. That is, all connections are treated equally. 17/84

18 Beck The algorithm is as follows. 1. Choose the smallest node i which is not in the coarse or fine set, and add it to the coarse set. 2. Add all nodes connected to node i from the last step to the fine set. 3. Repeat steps 1 and 2 until all nodes are in the coarse or fine set. 18/84

19 Beck For each node k in the fine set, define λ k to be the number of coarse points connected to k. The prolongation operator is constructed row wise, like in Ruge Stüben. P ij = { λi 1, if j C and i is connected to j 0, otherwise (4) 19/84

20 Smoothed Aggregation 20/84

21 Smoothed Aggregation Definition Let A be a stiffness matrix from a discretization of some PDE on a domain Ω. Define the strongly coupled neighborhood of node i for a fixed ɛ [0, 1) as N i (ɛ) = { j : a ij ɛ a ii a jj } (5) 21/84

22 Smoothed Aggregation The method will work in two main passes. First, aggregates will be chosen based on neighborhoods of strong connections. Only full neighborhoods will be selected. 22/84

23 Smoothed Aggregation Figure: Black Lines are strong connections. Red lines are weak connections. 23/84

24 Smoothed Aggregation Figure: Initial aggregates are selected based on strongly connected neighborhoods. 24/84

25 Smoothed Aggregation Once the initial aggregates have been chosen, the remaining nodes are added to the aggregates based on strong connections. Figure: Leftover points are added to aggregates based on order and strong connections. 25/84

26 Smoothed Aggregation Once the aggregates are chosen, the prolongation matrix must be formed. This is done by making an initial guess for the operator, and refining it with a smoothing step. Define C i as the set of nodes in aggregate i, and P i,j = { 1 if i Cj 0 otherwise. (6) 26/84

27 Smoothed Aggregation The filtered matrix A F = a F ij = ( ) a F ij is defined as { aij if j N i (ɛ) 0 otherwise a F ii = a ii n j=1,j i (a ij a F ij, for i j ) (7) 27/84

28 Smoothed Aggregation The prolongation operator is the damped Jacobi smoothing given by ( ) P = I ωd 1 A F P (8) where D = diag(a), A F is the filtered matrix, and ω = 3 4λ max (D 1 A) (9) 28/84

29 Implementing Simple AMG Algorithms 29/84

30 Implementation: Ruge Stüben Given a threshold value 0 < θ 1, the variable (point) u i strongly depends on (or is strongly connected to) the variable (point) u j if a ij θ max k i { a ik }. Define the set N to be the absolute maximal off diagonal element in each row of A. Since this is an ordered set, it will be treated as an array. The entries of N are defined as N i = max j i { a ij }. The matrix of strong connections S is a matrix whose elements are given by { 1, if ui is strongly connected to u S ij = j 0, otherwise. Define λ as an array which counts the number of strong connections for each node, λ i = S ij. j 30/84

31 Implementation: Ruge Stüben RugeStubenInitialPass(S, λ, n) C, F while C F < n do i index max j {λ j } C C {i} λ i 1 for j = 1 to n do if S ij = 1 and λ i 1 then F F {j} λ j 1 for k = 1 to n do if S jk = 1 and λ k 1 then λ k λ k + 1 end if end for end if end for end while return C, F 31/84

32 Implementation: Ruge Stüben RugeStubenRefinePass(C, F, S) n f F, n c C for i = 1 to n f do for j = i + 1 to n f do if S ij = 1 then connect false k 1 while connect = false and k n c do if S ick = 1 and S jck = 1 then connect true end if k k + 1 end while if connect = false then C C {i} F F \ {i} n c n c + 1 n f n f 1 break end if end if end for end for return C, F 32/84

33 Implementation Example: Ruge Stüben Consider the matrix A = /84

34 Implementation Example: Ruge Stüben Choose θ =.25 and all connections are strong. Once θ is chosen, S is constructed S = /84

35 Implementation Example: Ruge Stüben Now λ is constructed. For simplicity, λ will be given as a row vector, with the index of the element corresponding to the vertices of the graph of A, which also corresponds to the unknown u i. The value is found by counting the number of strong connections at each node. [ ] λ = Following Algorithm RugeStubenInitialPass, both the coarse and fine sets start empty. The u i with the largest lambda is chosen, which is u 5 and it is added to the coarse set, and all connected nodes are made fine. Thus, the coarse and fine sets are C ={u 5 } (10) F ={u 2, u 4, u 6, u 8 }. (11) The corresponding λ values for each of the above nodes is reduced to 1 for reasons of bookkeeping. 35/84

36 Implementation Example: Ruge Stüben Now, for each of the new fine nodes, each unassigned node has its λ value increased by 1. Note that if a node is connected to more then one fine node, it is incremented for each fine node it is connected to. So the new lambda values are [ ] λ = All that remains now are the corner nodes, which are all disjoint from each other. So, all of the remaining nodes will be made into coarse nodes. The coarse and fine nodes after the first pass are given by, C ={u 1, u 3, u 5, u 7, u 9 } (12) F ={u 2, u 4, u 6, u 8 }. (13) A quick inspection shows that there are no fine fine connections, so the second pass is not needed, and the coarse set is finalized... 36/84

37 Implementation Example: Ruge Stüben This illustrates the process required for the first pass of coarsening. The coarse set is in red and the fine set is in blue. Note that the third frame is the compilation of the final four iterations. Initial Grid First Iteration Final Coarsening 37/84

38 Implementation Example: Ruge Stüben Using (3), we construct the weights matrix (needed for the prolongation matrix), which will have one row for each element in F, and one column for each element in C. ω = /84

39 Implementation Example: Ruge Stüben The prolongation operator is constructed using the values from ω. The odd numbered rows correspond to the coarse nodes, so each of these rows is the identity. The even numbered rows come from ω. P = , 39/84

40 Implementation Example: Ruge Stüben The coarse matrix is A c = P T AP, or A c = /84

41 Implementation Example: Beck This example illustrates the method for choosing the set of coarse nodes with the Beck algorithm. Consider a mesh of 12 unknowns configured in a 3 by 4 grid numbered row wise from the bottom left. First Iteration Second Iteration Third Iteration Final Coarsening 41/84

42 Implementation Example: Beck Now consider another example: start with the symmetric positive definite matrix A f = Using Beck s algorithm, the coarse and fine sets are defined as C = {u 1, u 3 } and F = {u 2, u 4 }. 42/84

43 Implementation Example: Beck For each element of F, σ i must be computed. Since u 2 is connected to both u 1 and u 3, σ 2 = 2. However, u 4 is only connected to u 3, so σ 4 = 1. Hence, P = (14) 0 1 Note that while the 3rd and 4th rows are the same, it is for different reasons. The 3rd row is [0 1] because u 3 C and it is the second element on the coarse grid. The 4th row is [0 1] because u 4 F but has only one connection to a coarse point. Finally, A c = P T A f P = [ ], (15) 43/84

44 Matlab Tricks Implementing AMG in Matlab is relatively easy assuming that you follow some simple programming rules: Store all matrices as sparse matrices. Use data structures that are indexed by the level = 1, 2, Use a lot of small, carefully written functions that are each efficient. Be very careful with matrix arithmetic since Matlab likes to store temprary matrices as dense matrices. Matlab allows variables to be declared global. In some cases this becomes almost essential when implementing AMG. If you have a set of global constants, put them all in a function that all other functions call first. Use recursion carefully, but use it. 44/84

45 Matlab Tricks Consider a Matlab code fragment: A(level).matrix = genmat( level, N(level) ); if issparse( A(level).matrix ) == false A(level).matrix = sparse( A(level).matrix ); end The variable A can have other data associate with it using the. Matlab modifier. Depending on how you like to program, two possible styles are A(level).matrix, P (level).matrix, W (level).matrix, etc. LEV INF O(level).A, LEV INF O(level).P, LEV INF O(level).W, etc. 45/84

46 Matlab Tricks Useful functions for AMG and sparse matrices: nnz(a): the number of nonzeroes in A [nrows,ncols] = size(a): dimensions of A [rows,cols,vals] = f ind(a): the information about the sparse matrix in a 3-vector form Functions ones, zeros, and diag Possibly kron 46/84

47 AMGe 47/84

48 AMGe AMGe methods apply to finite element methods for PDEs. AMGe methods use information about the elements for coarsening. The global stiffness matrix is often ill conditioned for normal AMG methods. This AMGe method is a pre conditioner for the stiffness matrix. Define K to be a stiffness matrix from some FEM for a PDE on some domain Ω, and K (i) as the element (local) stiffness matrix on the element δ (i). To simplify notation, δ (i) will be denoted simply as element i unless confusion will arise. 48/84

49 AMGe The idea of this AMGe method is to precondition the element stiffness matrices before assembly so that the matrix K will be conditioned for standard AMG methods. That is to say, replace K (i) with a spectrally equivalent matrix B (i). Definition The SPSD matrices B R n n and A R n n are called spectrally equivalent if c 1, c 2 R + : c 1 Bu, u Au, u c 2 Bu, u u R n (16) which is briefly denoted by c 1 B A c 2 B. 49/84

50 AMGe Consider the generalized eigenvalue problem K u = λb u (17) with some given SPD matrix K and an SPD matrix B. Equation (17) is equivalent to the standard eigenvalue problem Xφ = µφ (18) with X = K 1/2 BK 1/2. µ = µ(x) = 1 λ and φ = φ(x) = K 1/2 u denote the eigenvalues and normalized eigenvectors respectively. 50/84

51 AMGe The matrix B (i) is found by solving the constrained minimization problem, minimize κ(x) = µmax µ min subject to B Z n r and B SPD. (19) Recall that Z n = { A R n n : a ii > 0, a ij 0, i j } (20) This minimization is non-trivial and is often accomplished using sequential quadratic programming (SQP). 51/84

52 AMGe The element matrices B (i) are then used to assemble the global stiffness matrix A. Standard AMG methods can then be applied to A normally. While the minimization procedure for each element matrix is expensive, the element matrices are very small in relation to the global stiffness matrix. In most problems, the element matrices are similar, and can be grouped into spectrally equivalent classes to reduces the number of minimizations required. 52/84

53 Optimization Assuming that a minimum can be found, a way to test if a point is really the minimum is required. The Karush, Kuhn and Tucker conditions (commonly called the KKT conditions) give necessary and sufficient conditions for a point to be a constrained minimum. The conditions were found independently by Karush (1939) and Kuhn and Tucker (1951). 53/84

54 KKT Conditions Consider the model problem minimize f(x) such that g j 0, j = 1, 2,..., m. (21) Recall the Lagrangian m L(x, λ) = f(x) + λ j g j (x). (22) j=1 Now the KKT conditions can be stated. 54/84

55 KKT Conditions Theorem Let the functions f and g j C 1, and assume the existence of Lagrange multipliers λ, then at the point x, corresponding to the solution of the model problem (21), the following conditions must be satisfied: f x i (x ) + m j=1 λ g j j x i (x ) = 0, i = 1, 2,..., n g j (x ) 0, j = 1, 2,..., m λ j g j(x ) = 0, j = 1, 2,..., m λ j 0, j = 1, 2,..., m. (23) The KKT conditions constitute necessary and sufficient conditions for x to be a constrained minimum, if f(x) and the g j (x) are all convex functions. 55/84

56 Quadratic Programming (QP) The first method to consider is the method of Quadratic Programming. Consider the model problem. subject to minimize f(x) = 1 2 xt Ax + b T x + c (24) Cx d (25) where C is an m n matrix and d is a m-vector. Note that this is a less general problem since it is restricted to a quadratic form. 56/84

57 Quadratic Programming (QP) If the solution is an interior point, then the solution is given by x = x 0 = A 1 b (26) Figure: Courtesy of Jan A. Snyman, Practical Mathematical Optimization. However, if x 0 as defined above is not in the interior, then the constants must be considered. 57/84

58 Quadratic Programming (QP) A (very) loose definition of an active set of constraints is the constraints that are being considered. That is, if the active set if given by {g 1 (x)}, then the solution must fall on the curve g 1. If the active set is {g 1 (x), g 2 (x)} then the solution will be on the intersection of g 1 and g 2. 58/84

59 Quadratic Programming (QP) If the active set is known a priori the solution is simple. Suppose the active set at x is known, i.e., c j x = d j for some j {1, 2,..., m}, where c j here denotes the 1 n matrix corresponding to the j th row of C. The active set of constraints is represented in matrix form by C x = d. The solution x is obtained by minimizing f(x) over the set {x : c x = d }. With the appropriate Lagrange theory the solution is obtained by solving the linear system: [ ] [ ] [ ] A C T x b C 0 λ = d. (27) Most of the work in Quadratic Programming is finding the active set. 59/84

60 Quadratic Programming The method outlined here for identifying the active set is by Theil and Van de Panne (1961). The method simply put says to activate the constraints one at a time and solve (27). If the solution passes all the constraints and the KKT condition, then the solution is a minimum. If these conditions are not met, take the active constraints two at a time, testing each solution in turn. Continue adding constraints and testing until a solution is found. 60/84

61 Quadratic Programming (QP) In the following figure, (1), (2), and (3) are constraints. x 0 is the unconstrained solution. Figure: Courtesy of Jan A. Snyman, Practical Mathematical Optimization. The points a, b, and c are the solutions with the constraints taken one at a time. The points u, v, and w are the solutions with the constraints two at a time. The optimum solution is v. 61/84

62 Quadratic Programming (QP) Example Solve the following QP problem: minimize f(x) = 1 2 x2 1 x 1 x 2 + x 2 2 2x 1 + x 2 (28) such that g 1 (x) = x 1 0 g 2 (x) = x 2 0 g 3 (x) = x 1 + x 2 3 g 4 (x) = 2x 1 x 2 4. (29) 62/84

63 In matrix form f(x) is given by f(x) = 1 2 xt Ax + b T x, with A = [ ] (30) and b = [ 2 1 ]. (31) The constraints are given by the matrix C and the vector d C = 1 1, (32) 2 1 d = (33) 63/84

64 The unconstrained solution is x 0 = A 1 b = [3, 1] T. However, the constraints g 3 and g 4 are not satisfied. The constraints will be activated one at a time. Taking g 1 as active, equation (27) becomes Solving this system gives x 1 x 2 λ 1 = x 1 x 2 λ 1 = This violates the KKT condition since λ 1 < (34). (35) 64/84

65 Next the constraint g 2 will be active. Again using (27), x 1 x 2 λ 2 = (36) The solution to this is [2, 0, 1] T, which fails the KKT conditions. The remaining steps will be complied in a table. 65/84

66 Active Set , 2 1, 3 1, 4 2, 3 2, 4 3, 4 Result [ ] T 3 1 fails g 3 [ ] T fails KKT [ ] T fails KKT [ ] T fails g 4 [ ] T fails g 3 [ ] T fails KKT [ ] T fails KKT [ ] T fails KKT [ ] T fails KKT [ ] T fails KKT [ ] T pass So with the active constraints g 3 and g 4, a minimum is found at x = [2.3333, ] T. 66/84

67 Sequential Quadratic Programming (SQP) The QP method is very good for optimization, but it is restricted by the form of the problem. So a different method needs to be employed to solve a general problem. This is the motivation for the Sequential Quadratic Programming (SQP) method. The SQP method is based on the application of Newton s method to determine x and λ from the KKT conditions of the constrained optimization problem. The determination of the Newton step is equivalent to the solution of a QP problem. 67/84

68 Sequential Quadratic Programming (SQP) Consider the problem Given estimates minimize such that f(x) g j (x) 0; j = 1, 2,..., m h j (x) = 0; j = 1, 2,..., r. ( x k, λ k, µ k), k = 0, 1,..., to the solution and respective Lagrange multiplier values, with λ k 0, then the Newton step s of the k + 1 iteration, such that x k+1 = x k + s is given by the solution to the following k-th QP problem. (37) 68/84

69 QP-k Sequential Quadratic Programming (SQP) ( x k, λ k, µ k) : Minimize with respect to s F (s) = f(x k ) + T f(x k )s st H L (x k )s (38) such that and [ g(x g(x k ] T ) ) + s 0 x (39) [ h(x h(x k ] T ) ) + s = 0 x (40) and where g = [g 1, g 2,..., g m ] T, h = [h 1, h 2,..., h r ] T and the Hessian of the classical Lagrangian with respect to x is H L (x k ) = 2 f(x k ) + m λ k j 2 g j (x k ) + j=1 r µ k j 2 h j (x k ). (41) j=1 69/84

70 Sequential Quadratic Programming (SQP) The solution of QP-k does not only give s, but also the Lagrange multipliers λ k+1 and µ k+1 from the solution of equation (27). So with x k+1 = x k + s, the next QP problem can be constructed. This iterative process continues until x k converges to the real minimum. Because this is Newton iteration, the convergence is fast provided that a good initial value was chosen. 70/84

71 Sequential Quadratic Programming (SQP) Example Minimize f(x) = 2x x 2 2 2x 1 x 2 4x 1 6x 2 (42) subject to g 1 (x) = 2x 2 1 x 2 0 g 2 (x) = x 1 + 5x g 3 (x) = x 1 0 g 4 (x) = x 2 0 (43) with initial point x 0 = [0, 1] T, and λ 0 = 0. 71/84

72 The first step is to construct the function F from equation (38). Both f(x k ) and H L (x k ) are needed for the construction. To compute H L (x k ) we also need 2 f(x k ), and 2 g j (x k ) for j = 1,..., 4. f(x k ) = [ 4x1 2x 2 4 4x 2 2x 1 6 ] (44) [ ] 2 f(x k 4 2 ) = 2 4 (45) [ ] 2 g 1 (x k 4 0 ) = 0 0 (46) [ ] 2 g i (x k 0 0 ) =, for i = 2, 3, (47) 72/84

73 Each of the constants also needs transformed for the QP-k problems. g 1 (x k, s) = 2x 2 1 x 2 + 4x 1 s 1 s 2 0 g 2 (x k, s) = x 1 + 5x s 1 + 5s 2 0 g 3 (x k, s) = x 1 s 2 0 g 4 (x k, s) = x 2 s 2 0 (48) Now all of the parts needed for the subproblem have been found. 73/84

74 QP-1: x 0 = [0, 1] T, and λ 0 = 0. So A = H L = b = and the constraints are given by [ [ ] 1 s 2 0 s 1 + 5s 2 0 s s 2 0. ], (49), (50) 74/84

75 In matrix form this becomes C = , (51) and d = (52) 75/84

76 Using the QP method outlined above, the solution is given by s s 2 = (53) λ and x 1 = x 0 + s = λ 1 = [ ], (54). (55) Now that x 1 and λ 1 have been computed, the next QP subproblem can be constructed and solved. 76/84

77 The solution is summarized in the following table. x k s k error [0, 1] [1.129, ] [1.1290,.07742] [ , ] [0.7526, ] [ , ] [0.6643, ] [ , ] [0.6589, ] 0 (56) x 5 = [0.6589, ] T which is the minimum to 4 digits. 77/84

78 Sequential Quadratic Programming (SQP) Comments The Good The Bad SQP method converges very fast. The implementation is simple. The method works on very general problems. The high rate of convergence requires a good initial guess. Computing the Hessian in a complex problem can range from non-trivial to impossible. 78/84

79 Sequential Quadratic Programming (SQP) Comments The Good The Bad SQP method converges very fast. The implementation is simple. The method works on very general problems. The high rate of convergence requires a good initial guess. Computing the Hessian in a complex problem can range from non-trivial to impossible. 78/84

80 Sequential Quadratic Programming (SQP) Comments The Good The Bad SQP method converges very fast. The implementation is simple. The method works on very general problems. The high rate of convergence requires a good initial guess. Computing the Hessian in a complex problem can range from non-trivial to impossible. 78/84

81 Sequential Quadratic Programming (SQP) Comments The Good The Bad SQP method converges very fast. The implementation is simple. The method works on very general problems. The high rate of convergence requires a good initial guess. Computing the Hessian in a complex problem can range from non-trivial to impossible. 78/84

82 Sequential Quadratic Programming (SQP) Comments The Good The Bad SQP method converges very fast. The implementation is simple. The method works on very general problems. The high rate of convergence requires a good initial guess. Computing the Hessian in a complex problem can range from non-trivial to impossible. 78/84

83 Sequential Quadratic Programming (SQP) Comments The Good SQP method converges very fast. The implementation is simple. The method works on very general problems. The Bad and the Ugly The high rate of convergence requires a good initial guess. Computing the Hessian in a complex problem can range from non-trivial to impossible. 78/84

84 Quasi-Newton Update The SQP method is a very effective method provide that the derivatives of the function to be optimized are available. In many applications the derivatives of the function being optimized are not easily calculated and must be approximated. A common method for approximating the Hessian is by a quasi-newton update. 79/84

85 Quasi-Newton Update Some new notation is needed. Definition The Lagrangian of the model problem (21) is given by L(x, λ) = f(x) + λ T g(x), (57) where x is a KKT point with associated Lagrange multipliers λ. The notation x L(x, λ) refers to the gradient of L with respect to x only. Similarly, xx L(x, λ) is the Hessian with respect to x. 80/84

86 Quasi-Newton Update The vector s will be defined the same way it is in the SQP case. That is, A new vector y k is required and is defined as s k = x k+1 x k. (58) y k = x L(x k+1, λ k+1 ) x L(x k, λ k+1 ). (59) To simplify the notation, a bar will be used to denote the (k + 1) terms (such as x), and unadorned variables (x) will be used for the k terms. 81/84

87 Quasi-Newton Update The statement of the problem will remain unchanged from the SQP case. The only new consideration is how the Hessian is computed. A reasonable initial guess is needed. This can be as simple as a centered difference. The method outlined here is the method of Wilson, Han, and Powell. 82/84

88 Quasi-Newton Update The update formula is given by, H = H + (y Hs) ct + c (y Hs) T c T s st (y Hs) cc T (c T s) 2, (60) where c is any vector with c T s 0. Common choices are c = s, c = y, and c = D 0 y with D 0 any fixed positive definite matrix. 83/84

89 Summary The methods presented here are representative of most of the major classes of AMG methods. Variations of these methods are used in very complex applications. Choosing the right method for an application can be a challenge. AMGLab is a tool designed to help simplify the process of choosing a class of AMG methods to use.. 84/84

Stabilization and Acceleration of Algebraic Multigrid Method

Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration