ESTIMATING THE OPTIMAL EXTRAPOLATION PARAMETER FOR EXTRAPOLATED ITERATIVE METHODS WHEN SOLVING SEQUENCES OF LINEAR SYSTEMS. A Thesis.

Size: px

Start display at page:

Download "ESTIMATING THE OPTIMAL EXTRAPOLATION PARAMETER FOR EXTRAPOLATED ITERATIVE METHODS WHEN SOLVING SEQUENCES OF LINEAR SYSTEMS. A Thesis."

Lambert Charles
6 years ago
Views:

1 ESTIMATING THE OPTIMAL EXTRAPOLATION PARAMETER FOR EXTRAPOLATED ITERATIVE METHODS WHEN SOLVING SEQUENCES OF LINEAR SYSTEMS A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Curtis J. Anderson December, 2013

2 ESTIMATING THE OPTIMAL EXTRAPOLATION PARAMETER FOR EXTRAPOLATED ITERATIVE METHODS WHEN SOLVING SEQUENCES OF LINEAR SYSTEMS Curtis J. Anderson Thesis Approved: Accepted: Advisor Dr. Yingcai Xiao Dean of the College Dr. Chand Midha Co-Advisor Dr. Zhong-Hui Duan Dean of the Graduate School Dr. George Newkome Co-Advisor Dr. Ali Hajjafar Date Department Chair Dr. Yingcai Xiao ii

3 ABSTRACT Extrapolated iterative methods for solving systems of linear equations require the selection of an extrapolation parameter which greatly influences the rate of convergence. Some extrapolated iterative methods provide analysis on the optimal extrapolation parameter to use, however, such analysis only exists for specific problems and a general method for parameter selection does not exist. Additionally, the calculation of the optimal extrapolation parameter can often be too computationally expensive to be of practical use. This thesis presents an algorithm that will adaptively modify the extrapolation parameter when solving a sequence of linear systems in order to estimate the optimal extrapolation parameter. The result is an algorithm that works for any general problem and requires very little computational overhead. Statistics on the quality of the algorithm s estimation are presented and a case study is given to show practical results. iii

4 TABLE OF CONTENTS LIST OF FIGURES Page v CHAPTER I. INTRODUCTION Stationary Iterative Methods Convergence of Iterative Methods Well-Known Iterative Methods Extrapolated Iterative Methods Sequence of Linear Systems II. AN ALGORITHM FOR ESTIMATING THE OPTIMAL EXTRAP- OLATION PARAMETER Spectral Radius Estimation Extrapolated Spectral Radius Function Reconstruction Parameter Selection Solver Integration III. PERFORMANCE ANALYSIS IV. CASE STUDY BIBLIOGRAPHY iv

5 Figure LIST OF FIGURES Page 1.1 The CR transition disk in relation to the unit disk Statistics on the ratio of the estimated average reduction factor ( σ i ) to the spectral radius (ρ) as i is varied for randomly generated systems Statistics on the ratio of the estimated spectral radius ( ρ) to the spectral radius (ρ) compared against the number of iterations required for convergence for randomly generated systems Example of ρ estimating points on the extrapolated spectral radius function The square of the magnitude of the spectral radius shown as the composition of the square of the magnitude of the respective eigenvalues for two randomly generated systems (only relevant eigenvalues are shown) Example of segmentation of samples based upon algorithm Constrained regression (left) versus unconstrained regression (right) applied to the same problem Example of estimating the extrapolated spectral radius function from samples that are all superior to non-extrapolated iteration (below the black dotted line) Performance results for Gauss-Seidel Performance results for SOR w = Performance results for SOR w = Slices of the volume solved in the example problem at particular times Sparsity plot of the Crank-Nicholson coefficient matrix for an 8x8x8 discretization (blue entries are nonzero entries of the matrix) v

6 4.3 Benchmark results for a pulsed source (S(x, y, z, t) = sin(t)) Benchmark results for a constant source. (S(x, y, z, t) = 0) vi

7 CHAPTER I INTRODUCTION Many applications in scientific computing require solving a sequence of linear systems A {i} x {i} = b {i} for i = 0, 1, 2,... for the exact solution x {i} = A {i} 1 b {i} where A {i} R n n, x {i} R n, and b {i} R n. Several direct methods exist for solving systems such as Gaussian elimination and LU decomposition that do not require explicitly computing A {i} 1, however, such methods may not be the most efficient or accurate methods to use on very large problems. Direct methods often require the full storage of matrices in computer memory as well as O(n 3 ) operations to compute the solution. As matrices become large these drawbacks become increasingly prohibitive. Iterative methods are an alternative class of methods for solving linear systems that alleviate many of the drawbacks of direct methods. Iterative methods start with an initial solution vector x (0) and subsequently generate a sequence of vectors { x (i)} i=1 takes the form of a recursive function which converge to the solution x. Iteration x (i+1) = Φ(x (i) ), (1.1) 1

8 which is expected to converge to x, however, convergence for iterative methods is not always guaranteed and is dependent upon both the iterative method and problem. The calculation of each iteration is often achieved with significantly lower computation and memory usage than direct methods. There are several choices for iterative functions, leading to two major classes of iterative methods. Stationary methods have an iteration matrix that determine convergence while non-stationary methods, such as Krylov subspace methods, do not have an iteration matrix and convergence is dependent upon other factors [1, 2]. Throughout this thesis the name iterative method is intended to refer to stationary methods since only stationary methods and their extrapolations are studied. 1.1 Stationary Iterative Methods Let a nonsingular n n matrix A be given and a system of linear equations, Ax = b, with the exact solution x = A 1 b. We consider an arbitrary splitting A = N P for the matrix A, where N is nonsingular. A stationary iterative method can be found by substituting the splitting into the original problem (N P )x = b (1.2) and then setting the iteration Nx (i+1) = P x (i) + b (1.3) 2

9 and solving for x (i+1), x (i+1) = N 1 P x (i) + N 1 b. (1.4) This method was first developed in this generality by Wittmeyer in 1936 [3]. Convergence to the solution is heavily dependent upon the choice of splitting for N and P and is not generally guaranteed for all matrices, however, for certain types of matrices some splittings can guarantee convergence. Intelligent choices for splittings rarely require finding the matrix N 1 explicitly and instead computation occurs by solving the system in equation (1.3) for x (i+1). In the following sections the most well-known iterative methods are introduced along with general rules of convergence. 1.2 Convergence of Iterative Methods Convergence is the cornerstone of this thesis since it is required for the method to be of any use. It is well known that convergence occurs if and only if the spectral radius of the iteration matrix, N 1 P, is less than one [4], that is, if λ i is an eigenvalue of the matrix N 1 P then ρ(n 1 P ) = max 1 i n λ i < 1. (1.5) Additionally, the rate of convergence for an iterative method occurs at the same rate that ρ(n 1 P ) i 0 as i. Thus, we are not only concerned about having ρ(n 1 P ) within unity, but also making ρ(n 1 P ) as small as possible to achieve the fastest convergence possible. The rate of convergence between two iterative methods can be compared through the following method: If ρ 1 and ρ 2 are the respective 3

10 spectral radii of two iterative methods then according to ρ n 1 = ρ m 2 (1.6) iterative method 1 will require n iterations to reach the same level of convergence as iterative method 2 with m iterations. Solving explicitly for the ratio of the number of iterations required results in n m = ln(ρ 2) ln(ρ 1 ). (1.7) Equation (1.7) allows for the iteration requirements for different methods to be easily compared. For example, if ρ 1 =.99 and ρ 2 =.999, then we see ln(ρ 2) ln(ρ 1 ) = , thus, method 1 requires only 9.95% the iterations that method 2 requires. Additionally, if ρ 1 =.4 and ρ 2 =.5, then we see ln(ρ 2) ln(ρ 1 ) = , thus, method 1 requires only 75.65% the iterations that method 2 requires. The first example shows how important small improvements can be for spectral radii close to 1 while the second example shows that the absolute change in value of a spectral radius is not an accurate predictor of the improvement provided by a method. Thus, for a proper comparison, equation (1.7) should be used when comparing two methods. 1.3 Well-Known Iterative Methods Given a nonsingular n n matrix A, the matrices L, U, and D are taken as the strictly lower triangular, the strictly upper triangular, and the diagonal part of A, 4

11 respectively. The following sections detail the most well-known splittings and their convergence properties Jacobi Method The Jacobi method is defined by the splitting A = N P, where N = D and P = (L + U) [5,6]. Therefore, the Jacobi iterative method can be written in vector form as x (k+1) = D 1 (L + U)x (k) + D 1 b. (1.8) In scalar form, the Jacobi method is written as x (k+1) i = 1 a ii n j=1 j i a ij x (k) j b i for i = 1, 2,, n. (1.9) Algorithm implements a one Jacobi iteration in MATLAB. Algorithm 1.3.1: Jacobi Iteration 1 % Compute one i t e r a t i o n s o f the Jacobi method 2 function [ x new ] = J a c o b i I t e r a t i o n (A, x old, b ) 3 4 % Find s i z e o f the matrix 5 n = length ( b ) ; 6 7 % I n i t i a l i z e array f o r next i t e r a t i o n value 8 x new = zeros (n, 1 ) ; 9 10 for i = 1 : n 11 for j = 1 : n 12 i f ( i = j ) 13 x new ( i ) = x new ( i ) + A( i, j ) x o l d ( j ) ; 14 end 15 end 16 5

12 17 x new ( i ) = ( b ( i ) x new ( i ) ) /A( i, i ) ; 18 end 19 end The Jacobi method is guaranteed to converge for diagonally dominant systems [7]. A system is said to be diagonally dominant provided that it s coefficient matrix has the property that a ii > n j=1 a ij for i = 1,2,..., n. This means that j i in each row of the coefficient matrix the magnitude of the diagonal element is larger than the sum of the magnitudes of all other elements in the row Gauss-Seidel Method The Gauss-Seidel method is a modification of the Jacobi method that can sometimes improve convergence. If the Jacobi method computes elements sequentially then the elements x (k+1) 1, x (k+1) 2,..., x (k+1) i 1 have already been computed by the time the i th element x (k+1) i is computed. The Gauss-Seidel method makes use of these more recently computed elements by substituting them in place of the older values. Therefore, in the computation of the element x (k+1) i the Gauss-Seidel method utilizes the elements x (k) i+1, x(k) i+2,..., x(k) n from the k th iteration and the elements x (k+1) 1, x (k+1) 2,..., x (k+1) i 1 from the (k + 1) th iteration. This substitution results in the the scalar equation ( x (k+1) i = 1 i 1 a ij x (k+1) j + a ii j=1 n j=i+1 a ij x (k) j b i ) for i = 1, 2,, n. (1.10) Algorithm implements one Gauss-Seidel iteration in MATLAB. 6

13 Algorithm 1.3.2: Gauss-Seidel Iteration 1 % Compute one i t e r a t i o n o f the Gauss S e i d e l method 2 function [ x new ] = G a u s s S e i d e l I t e r a t i o n (A, x old, b ) 3 4 % Find s i z e o f the matrix 5 n = length ( b ) ; 6 7 % I n i t i a l i z e array f o r new i t e r a t i o n value 8 x new = zeros (n, 1 ) ; 9 10 for i = 1 : n 11 x new ( i ) = b ( i ) ; for j = 1 : i 1 14 x new ( i ) = x new ( i ) A( i, j ) x new ( j ) ; 15 end for j = i +1:n 18 x new ( i ) = x new ( i ) A( i, j ) x o l d ( j ) ; 19 end x new ( i ) = x new ( i ) /A( i, i ) ; 22 end 23 end To write the vector form of the Gauss-Seidel method, let the matrices L, U, and D be defined as earlier. By taking the splitting N = D + L and P = U the vector form of Gauss-Seidel can be shown as x (k+1) = (D + L) 1 Ux (k) + (D + L) 1 b. (1.11) It is well known that if the matrix A is diagonally dominant then the Gauss-Seidel method is convergent and the rate of convergence is at least as fast as the rate of convergence of the Jacobi method [7]. Also, convergence is guaranteed for positive definite matrices [4]. 7

14 1.3.3 Successive Over-Relaxation Method The successive over-relaxation (SOR) method is a modification of the Gauss-Seidel method that introduces a relaxation parameter to affect convergence. SOR was first introduced by David M. Young in his 1950 dissertation [8]. In order to approximate x (k+1) i, SOR introduces the temporary Gauss-Seidel approximation ( ˆx (k+1) i = 1 i 1 b i a ij x (k+1) j a ii j=1 n j=i+1 a ij x (k) j ) (1.12) that is extrapolated with x (k) i by a parameter ω. In other words, we obtain x (k+1) i = ωˆx (k+1) i + (1 ω)x (k) i = x (k) i j=1 + ω(ˆx (k+1) i = (1 ω)x (k) i + ω i 1 [b i a ij x (k+1) j a ii x (k) i ) n j=i+1 a ij x (k) j ]. (1.13) Rearranging (1.13) for i = 1, 2,..., n, the scalar form of SOR is a ii x (k+1) i (i 1) + ω a ij x (k+1) j j=1 = (1 ω)a ii x (k) i ω n j=i+1 a ij x (k) j + ωb i. (1.14) In vector form (1.14) can be written as Dx (k+1) + ωlx (k+1) = (1 ω)dx k ωux (k) + ωb or 1 ω (D + ωl)x(k+1) = 1 ω ((1 ω)d ωu)x(k) + b. (1.15) 8

15 Equation (1.15) shows that the SOR method splits A as A = N P, where N = 1 ω D + L and P = ( 1 ω 1 ) D U, (1.16) resulting in the iteration matrix H(ω) = N 1 P = (D + ωl) 1 ((1 ω)d ωu). (1.17) Therefore, in vector form, SOR can be written as x (k+1) = (D + ωl) 1 ((1 ω)d ωu)x (k) + (D + ωl) 1 ωb. (1.18) Algorithm implements one iteration of SOR in MATLAB. Algorithm 1.3.3: SOR Iteration 1 % Compute one i t e r a t i o n o f the SOR method 2 function [ x new ] = SOR Iteration (A, x old, b,w) 3 4 % Find s i z e o f matrix 5 n = length ( b ) ; 6 7 % I n i t i a l i z e array f o r next i t e r a t i o n value 8 x new = zeros (n, 1 ) ; 9 10 for i = 1 : n 11 x new ( i ) = b ( i ) ; for j = 1 : i 1 14 x new ( i ) = x new ( i ) A( i, j ) x new ( j ) ; 15 end for j = i +1:n 18 x new ( i ) = x new ( i ) A( i, j ) x o l d ( j ) ; 19 end 20 9

16 21 r e s u l t ( i ) = x o l d ( i ) + w ( x new ( i ) /A( i, i ) x o l d ( i ) ) ; 22 end 23 end Notice that if ω = 1, the SOR method reduces to the Gauss-Seidel method. Convergence of SOR depends on ρ(h(ω)) and it is well known (Kahan s theorem) that for an arbitrary matrix A, ρ(h(w)) ω 1 [9]. This implies that if the SOR method converges, ω must belong to the interval (0,2). Furthermore, if A is symmetric positive definite, then for any ω in (0,2), SOR is convergent [10]. 1.4 Extrapolated Iterative Methods An extrapolated scheme which converges to the same solution as (1.4) can be defined by x (k+1) = µφ(x (k) ) + (1 µ)x (k) = x (k) + µ(φ(x (k) ) x (k) ) (1.19) where µ is an extrapolation parameter. Substitution of Φ from (1.4) into (1.19) results in the extrapolated iterative method [11, 12], x (k+1) = ((1 µ)i + µn 1 P )x (k) + µn 1 b. (1.20) Note that equation (1.20) is equivalent to equation (1.4) when µ = 1. The addition of an extrapolation parameter changes the rules of convergence for a splitting which is now convergent if and only if ρ((1 µ)i +µn 1 P ) < 1. In [12], the controlled relaxation method (CR method) was introduced which analyzed the 10

17 Figure 1.1: The CR transition disk in relation to the unit disk convergence of extrapolated iteration. It was found that an appropriate µ can be chosen to make extrapolated iteration converge if all the eigenvalues of the iteration matrix, N 1 P, have real parts all greater than 1 or all less than 1. If the eigenvalues of N 1 P are {λ j = α j + iβ j } n j=1 then the eigenvalues of (1 µ)i + µn 1 P are {1 µ+µλ j } n j=1. If α j < 1, then for 0 < µ <, 1 µ+µλ j will shift λ j to a position on the half-line starting with the point (1,0) and passing through λ j. In [12], it is shown that if µ λj = 1 Real(λ j) 1 λ j 2 = 1 Real(λ j ) 1 + λ j 2 2Real(λ j ), (1.21) then 1 µ λj + µ λj λ j will shift λ j to a point on the circle with center ( 1 2, 0) and radius 1 called the transition circle (see figure 1.1). Notice that if λ 2 j is in the interior of the transition circle, then µ λj 1, otherwise 0 < µ λj < 1. Additionally, if α j > 1 then µ λj < 0. 11

18 Now suppose all the eigenvalues have a real part less than 1 (for all j, α j < 1). For each 1 j n, equation (1.21) provides a shift parameter µ λj by which λ j will be transformed onto the transition circle. Next, define µ = min 1 j n µ λj. For 1 j n, 1 µ + µλ j are the eigenvalues of the matrix (1 µ)i + µn 1 P which belong to the transition disk and at least one of them lies on the transition circle. In the case where all the eigenvalues of N 1 P have real parts greater than 1 (for all j, α j > 1) equation (1.21) will shift λ j to 1 µ λj + µ λj λ j on the transition circle and µ = max 1 j n µ λj will shift all eigenvalues of the iteration matrix (1 µ)i + µn 1 P inside or on the transition disk. Starting with a matrix where all the eigenvalues of the iteration matrix N 1 P have real parts all less than 1 or all greater than 1, the CR method (1.20) with an appropriate µ will converge independent of the initial vector x (0). However, the shift parameter µ may push the eigenvalues of (1 µ)i + µn 1 P too close to the point (1,0) which is on the boundary of the unit disk causing slower convergence. At this point, the spectral radius can be modified by shifting all the eigenvalues to the left on the shift lines. This can be done by applying the CR method to (1.20) with a µ larger than one. The resulting iterative method is called controlled over-relaxation (COR) defined by the iteration x (k+1) = ((1 µµ )I + µµ N 1 P )x (k) + µµ N 1 b. (1.22) 12

19 Since one of the eigenvalues of the matrix (1 µ)i +µn 1 P lies on the transition disk, it is shown in [12] that (1.22) with the calculated µ converges if and only if 0 < µ < 2. While COR is helpful for understanding the rules of convergence, it s reliance upon the eigenvalues for the calculation of µ makes it impractical for real world use which is discussed later. Henceforth, in this thesis the extrapolation parameter µ has no relation to the calculated µ from COR. One final note is that since the extrapolated spectral radius function ρ(m(µ)) = max 1 i n (1 µ) + µλ i (1.23) is the composition of a linear equation, the absolute value, and the max function, the spectral radius of an extrapolated iterative method is convex. Therefore, there exists a single optimal extrapolation parameter that will provide the global minimum of the extrapolated spectral radius function resulting in the fastest convergence possible. 1.5 Sequence of Linear Systems As mentioned previously, many problems in scientific computing require solving not just one system of linear equations but a sequence of linear systems A {i} x {i} = b {i} for i = 1, 2,... (1.24) 13

20 Often there is the requirement that the system A {i} x {i} = b {i} must be solved in order to generate the system A {i+1} x {i+1} = b {i+1}, that is, A {i+1} = f(a {i}, x {i}, b {i} ) and b {i+1} = g(a {i}, x {i}, b {i} ). In the case that A {i} is constant, then the iteration matrix remains invariant as elements of the sequence are solved and thus a single particular extrapolation parameter will be optimal for all the systems in the sequence. The repeated solving of linear systems is the cornerstone of this thesis as it allows what would be expensive computations to be estimated very cheaply by evaluating properties of previously solved systems. The evaluation of previous systems would not be possible if only one system is being solved and thus is only useful when solving a sequence of linear systems. The proposed algorithm for estimating the optimal extrapolation parameter is outlined in chapter 2 and performance results are presented in chapters 3 and 4. 14

21 CHAPTER II AN ALGORITHM FOR ESTIMATING THE OPTIMAL EXTRAPOLATION PARAMETER An algorithm for estimating the optimal extrapolation parameter for an extrapolated stationary solver can be broken down into four parts. The first part of the algorithm estimates the spectral radius of the extrapolated iteration matrix for a particular extrapolation parameter. The second part of the algorithm reconstructs the extrapolated spectral radius function from spectral radius estimation samples. The third part of the algorithm establishes rules that dictate how subsequent sample locations are chosen and where the optimal value lies. The fourth part of the algorithm is the integration of the previous three parts into the iterative solver. The result of this algorithm is an iteratively refined estimation of the optimal extrapolation parameter. 2.1 Spectral Radius Estimation Determining the eigenvalues and thus the spectral radius of a matrix can be extremely expensive and is often more expensive than solving a system of linear equations. For an n n matrix, typical algorithms for finding the eigenvalues of a matrix, such as the QR method, require O(n 3 ) operations as well as the full storage of a matrix in memory. Iterative techniques such as power iteration exist for finding the dominant eigenvalue 15

22 and dominant eigenvector of a matrix, however, the iteration is not guaranteed to converge if there is more than one dominant eigenvalue and thus cannot be used for general cases. Therefore, calculating the spectral radius based upon eigenvalue techniques is not practical for use in a low-cost general algorithm. The spectral radius can, however, be estimated based upon the rate of convergence of a stationary iterative solver. As the number of iterations computed by a solver grows, the rate of convergence is dictated by the spectral radius, thus, the average reduction factor per iteration is able to give an estimation of the spectral radius when enough iterations are computed. For a particular extrapolation parameter, iteration is carried out which generates the solution sequence {x i } n i=0 which should converge to the solution x. by [4], can be written as The average reduction factor for i iterations, defined σ i := ( ) 1 xi x i. (2.1) x 0 x Note that the calculation of σ i requires the exact value x which is unknown. The final iteration computed, x n, can be used as an estimation of x resulting in the estimation of the average reduction factor σ i := ( ) 1 xi x n i, for 0 < i < n. (2.2) x 0 x n Care must now be given to how well σ i estimates the spectral radius along with which σ i in the set { σ i } n 1 i=1 should be chosen to represent the final average reduction factor ρ for this set of data. The number of elements in {x i }, and thus x n, are usually 16

23 1 0.8 σi/ρ th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile i used for σ i as percent of n Figure 2.1: Statistics on the ratio of the estimated average reduction factor ( σ i ) to the spectral radius (ρ) as i is varied for randomly generated systems. determined by the tolerance of the solver for the problem at hand. Thus, the choice of n is not controllable in the calculation of ρ. The remaining problem is choosing i such that 0 < i < n and σ i is as accurate as possible for estimating the spectral radius. Figure 2.1 presents statistics that were gathered on the appropriate i to choose for the computation of ρ. Randomly generated positive definite systems were solved using the Gauss-Seidel method and the sequence {x i }, and thus { σ i }, were analyzed. Due to the fact that each system can require a different number of iterations to achieve convergence, the horizontal axis of figure 2.1 is given as i as a percentage of n so the 17

24 1 0.8 ρ/ρ th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Iterations for solution convergence Figure 2.2: Statistics on the ratio of the estimated spectral radius ( ρ) to the spectral radius (ρ) compared against the number of iterations required for convergence for randomly generated systems. statistics are normalized for the number of iterations required. The vertical axis of figure 2.1 is the ratio of the average reduction factor σ i to the spectral radius, letting values close to 1 show that the estimation is accurate in estimating the spectral radius. We see that choosing an element in the sequence { σ i } for use as the final estimation ρ is fairly straight forward because choosing an i that is close to n provides the highest quality estimation. Although there is a very small dip in accuracy towards 100 in figure 2.1, σ n 1 provides sufficient accuracy for use as ρ. 18

25 Spectral Radius ρ Extrapolation Parameter Figure 2.3: function. Example of ρ estimating points on the extrapolated spectral radius Now that the selection of ρ is understood, the accuracy and precision of its estimation of the spectral radius must be evaluated. Figure 2.1 normalized i and n for each sample for purpose of analyzing σ i free from particular cases of i and n. Figure 2.2 shows that the accuracy and precision of ρ in estimating the spectral radius is somewhat dependent upon n. Notice that the 50th percentile remains fairly steady for all values of n, this shows that ρ can provide a fairly accurate estimation of the spectral radius over nearly all values of n. However, looking at the spread of the percentiles for smaller values of n we see that the precision of the estimation is reduced. While the highest quality estimations are preferred, because the algorithm 19

26 proposed is based upon statistical methods, issues due to lack of precision can be mitigated by gathering more samples. Figure 2.3 shows an example of ρ being used to estimate the spectral radius of an extrapolated iteration matrix as the extrapolation parameter is varied. It should be noted that if iteration diverges then ρ 1 which can be seen on the far right of figure 2.3. Detecting divergence can be useful so unnecessary computation is avoided. 2.2 Extrapolated Spectral Radius Function Reconstruction The previous section gives a technique for cheaply estimating the spectral radius after solving one system of linear equations. If solving a sequence of linear systems, the extrapolation parameter can be varied so the extrapolated spectral radius function can be cheaply sampled while solving elements of the sequence. These samples contain random variations that makes them unsuitable for direct reconstruction techniques such as interpolation. To deal with the random nature of the samples a statistical regression can be used for the reconstruction of the extrapolated spectral radius function. Figure 2.3 shows an example of the spectral radius function that needs to be reconstructed along with the samples that could be used for the reconstruction. Analyzing the mathematical properties of the spectral radius function gives some insight into the regression model that should be used. The greatest consideration in the regression model is that the square of the spectral radius function is a piecewise quadratic function, that is, if λ j = α j + iβ j is an eigenvalue of the iteration matrix 20

27 Square of Magnitude Square of Magnitude Spectral Radius Eigenvalues Extrapolation Parameter ( µ ) 0.1 Spectral Radius Eigenvalues Extrapolation Parameter ( µ ) Figure 2.4: The square of the magnitude of the spectral radius shown as the composition of the square of the magnitude of the respective eigenvalues for two randomly generated systems (only relevant eigenvalues are shown). N 1 P and M(µ) is the extrapolated iteration matrix then ρ(m(µ)) = max 1 j n (1 µ) + µλ j (2.3) = ρ(m(µ)) 2 = max 1 j n (1 µ) + µλ j 2 (2.4) = max 1 j n 1 + 2(α j 1)µ + ((α j 1) 2 + β 2 j )µ 2. (2.5) Typical extrapolated spectral radius functions are made up of 2 to 5 segments but are not strictly limited in number. Since the purpose of reconstructing the spectral radius function is to estimate the location of the minimum value which will only occur where the derivative of a segment is equal to zero or at an intersection of segments, 21

28 at most only the two segments adjacent to the minimum value are required to be reconstructed Segmentation Method Given a set of samples of the extrapolated spectral radius function, the goal of segmentation is to sort the samples into groups based upon the segment they were sampled from. Without any knowledge of which segment a sample came from, segments can be estimated by grouping samples such that the error of the piecewise regression is minimized. Algorithm is proposed as a method for finding the optimal grouping of samples to best minimize regression error and is explained in the following. Algorithm 2.2.1: Segmentation 1 function [ segments ] = FindSegments ( x input, y input ) 2 3 % Sort samples 4 [ x s o r t p e r m u t a t i o n ] = sort ( x input ) ; 5 y = y input ( s o r t p e r m u t a t i o n ) ; 6 7 % I n i t i a l i z e v a l u e s 8 n = length ( x ) ; 9 e ( 1 : n ) = Inf ; % Generate e r r o r t a b l e 12 for i = 1 : n 1 13 % Get r e g r e s s i o n c o e f f i c i e n t s 14 BL = Regression (X( 1 : i ),Y( 1 : i ) ) ; 15 BR = Regression (X( i +1:n ),Y( i +1:n ) ) ; % C a l c u l a t e and s t o r e r e g r e s s i o n e r r o r 18 EL = R egressionerror (BL,X( 1 : i ),Y( 1 : i ) ) ; 19 ER = R egressionerror (BR,X( i +1:n ),Y( i +1:n ) ) ; e ( i ) = EL + ER; 22 end % C a l c u l a t e optimum c h o i c e s 25 s p l i t l o c a t i o n = find ( error == min( error ) ) ; 26 22

29 27 % Find i n t e r s e c t i o n o f segments 28 a = BL( 3 ) BR( 3 ) ; 29 b = BL( 2 ) BR( 2 ) ; % Make sure samples to the l e f t o f the i n t e r s e c t i o n belong to 32 % the l e f t segment and samples to the r i g h t the r i g h t segment 33 i f ( a = 0 && b/a > 0) 34 for j = 2 : n 35 i f (X( j ) > b/a ) 36 segments = [ 1 j 1; j n ] ; 37 return ; 38 end 39 end 40 end % I f t h e r e i s no i n t e r s e c t i o n, r eturn r e s u l t s 43 segments = [ 1 s p l i t l o c a t i o n ( 1 ) ; s p l i t l o c a t i o n ( 1 )+1 n ] ; 44 end As previously mentioned, at most only two segments need to be found, which we call the left segment and the right segment. Samples must belong to either the left segment or the right segment. To prepare the data, the samples need to be sorted by their x position so that groups are easily determined by the segmentation position i for a left group x(1 : i) and a right group x(i + 1 : n). Note that x and y are parallel arrays so sorting and grouping methods should handle y data accordingly. Next, all the possible segmentations are iterated through. For each potential segmentation a regression for the left segment and right segments are computed, the total error of the regressions is calculated, then the error is stored in an array that tracks the total error for each potential segmentation. The optimal segmentation will be the one that minimizes the amount of error in the regression. However, due to the nature of regression, in some scenarios samples that belong to the left segment may end up on the right side of the intersection of segments or vice-versa with right samples. Thus, 23

30 a final classification of samples into left and right segments is made by assigning all samples to the left of the intersection to the left segment and all the samples to the right of the intersection to the right segment, as seen in lines of algorithm Once the optimal segmentation is found, a regression can be run on the left and right segment samples to reconstruct the spectral radius. Figure 2.5 shows an example of segmentation done by algorithm with the left segment colored green and right segment colored red. A comparison of the regressions provided by the segmentation (solid lines) is made against the quadratic functions they are trying to reconstruct(dotted lines). The ability of the segmentation algorithm to find the segments when given good data is clearly satisfactory. Because of the statistical nature of regressions, additional sample points can be used to refine the regression for more accurate results Regression Constraints Figure 2.5 shows a good reconstruction generated from accurate samples that are uniformly spaced across our area of interest, however, such well behaved samples are rarely the case. Utilizing the mathematical properties of the extrapolated spectral radius function allows constraints to be added to the regression to make reconstructions as accurate as possible. Equation (2.5) shows that the square of each segment of the extrapolated spectral radius function is a quadratic function, thus, the regression model for segment j of the spectral radius function is a j µ 2 +b j µ+c j. The mathematics of equation (2.5) also show that restrictions can be placed on the coefficients of 24

31 Square of Magnitude Spectral Radius Left Eigenvalue 0.2 Right Eigenvalue Left Sample Left Regression 0.1 Right Sample Right Regression Extrapolation Parameter ( µ ) Figure 2.5: Example of segmentation of samples based upon algorithm the quadratic function. First, note that c j must always be equal to 1. Additionally, as noted in [12], if {λ i = α i + iβ i } n i=1 are the eigenvalues of the iteration matrix then extrapolated iteration converges if and only if every α i is less than 1, thus, 2(α i 1) is always negative dictating that b j must always be negative. Finally, (α i 1) 2 + βi 2 is always positive dictating that a j must also always be positive. Applying these constraints can be trivial by transforming the data and constraints into a non-negative least squares problem. Algorithm implements these transformations which are explained in the following. If there are n samples with m explanatory variables and X R n m, Y R n 1, and B R m 1 where X contains the explanatory samples, Y contains the response samples, and B contains the regression coefficients then the typical least 25

32 Square of Magnitude Square of Magnitude Spectral Radius Left Sample 0.1 Left Regression Right Sample Right Regression Extrapolation Parameter ( µ ) 0.2 Spectral Radius Left Sample 0.1 Left Regression Right Sample Right Regression Extrapolation Parameter ( µ ) Figure 2.6: Constrained regression (left) versus unconstrained regression (right) applied to the same problem. squares problem attempts to choose the vector B such that XB Y is minimized. The non-negative least squares problem also attempts to choose the vector B such that XB Y is minimized, however, it is subject to B 0. An algorithm for solving non-negative least squares problems is given in [13]. The first transformation to set up the non-negative least squares problem will be to account for c j being equal to 1. If the i th sample point consists of the values 26

33 (µ i, y i ) and the j th segment consists of the coefficients a j, b j and c j, then y i a j µ 2 i + b j µ i + c j (2.6) = y i a j µ 2 i + b j µ i + 1 (2.7) = y i 1 a j µ 2 i + b j µ i (2.8) = ȳ i a j µ 2 i + b j µ i where ȳ i = y i 1. (2.9) Now the regression can be run on equation (2.9) and only two coefficients need to be determined. The additional constraints dictated by the mathematics require that a j > 0 and b j < 0 which means it is not currently a non-negative problem. Through substitution, the least squares problem that satisfies ȳ i a j µ 2 i + b j µ i where ȳ i = y i 1, a j > 0, and b j < 0 (2.10) can be transformed into ȳ i a j µ 2 i + b j µ i where ȳ i = y i 1, µ i = µ i, a j > 0, and b j > 0. (2.11) The substitution of µ i for µ i in equation (2.11) allows µ i to absorb the negative constraint, resulting in constraints that all require positiveness and thus is a problem fit for non-negative least squares. 27

34 Figure 2.6 shows the effect that constraints can have on the reconstruction. The left plot applies the constraints and shows an accurate reconstruction that passes through (0,1) with both segments being convex. The right plot, however, has no constraints and clearly does not pass through (0,1) and the left segment is concave. Not only does this lead to an inaccurate reconstruction but it also leads to difficulties in selecting an optimal extrapolation parameter because the segments do not intersect. Algorithm 2.2.2: Constrained Regression 1 % r e s u l t i s a 3 element array that c o n t a i n s the c o e f f i c i e n t s 2 % o f a q uadratic f u n c t i o n 3 function [ r e s u l t ] = ConstrainedRegression ( x input, y input ) 4 5 n = length ( x input ) ; 6 7 % Transform data f o r c o n s t r a i n e d r e g r e s s i o n 8 y ( i ) = y input 1; 9 10 x ( 1, : ) = x input ; 11 x ( 2, : ) = x input. ˆ 2 ; % Run non n e g a t i v e l e a s t squares r e g r e s s i o n 14 n n l s r e s u l t = NNLS( x, y ) ; % Transform nnls r e s u l t back to usable r e u l t s 17 r e s u l t ( 1 ) = 1 ; 18 r e s u l t ( 2 ) = n n l s r e s u l t s ( 1 ) ; 19 r e s u l t ( 3 ) = n n l s r e s u l t s ( 2 ) ; 20 end 2.3 Parameter Selection The previous sections have outlined how to sample the extrapolated spectral radius function and how to reconstruct the extrapolated spectral radius function from samples, however, the question of where to take samples from has yet to be addressed. 28

35 Samples up to this point have been chosen in a uniformly spaced region around the area of interest, i.e. around the optimal value. However, without any prior information about the the region surrounding the optimal value the problem of where to select samples from becomes difficult. Poor choices for sample locations can be computationally expensive due to the required solving of a linear system with an extrapolation parameter that is suboptimal. Since an extrapolation parameter of 1 is equivalent to no extrapolation, parameter selection should try to find the optimal extrapolation parameter while always using extrapolation parameters that provide superior convergence over non-extrapolated iteration. Figure 2.7 shows a reconstructed spectral radius function and contains a dotted black line that samples should ideally be taken below since any sample in this region would be superior to non-extrapolated iteration. The goal of parameter selection is to estimate the optimal extrapolation parameter while also refining the reconstruction of the extrapolated spectral radius function. Parameter selection can be broken down into three important functions, EstimatedOptimalMu, ApplyJitter, and ParameterSelection which are described in the following sections EstimatedOptimalMu As previously mentioned, the minimum value of the extrapolated spectral radius function will occur at either the intersection of segments or on a segment which has a derivative equal to zero. Thus, the first step in finding the optimal parameter will be 29

36 Square of Magnitude Spectral Radius Left Sample Left Regression 0.1 Right Sample Right Regression Extrapolation Parameter ( µ ) Figure 2.7: Example of estimating the extrapolated spectral radius function from samples that are all superior to non-extrapolated iteration (below the black dotted line). to find the intersection of the two segments that have been reconstructed. If a L and b L are the quadratic and linear coefficients of the left segment, respectively, and a R and b R are the coefficients of the right segment, then the intersection will be located where a L µ 2 + b L µ + 1 = a R µ 2 + b R µ + 1 (2.12) = (a L a R )µ 2 + (b L b R )µ = 0. (2.13) 30

37 Let a I = a L a R and b I = b L b R be the coefficients of equation (2.13), then the segment intersections occur at µ = b I a I and µ = 0. (2.14) Because the intersection at µ = 0 is already known from the constraint placed on the regression, we are interested in the intersection at µ = b I a I which is calculated on lines of algorithm However, it is possible that a I is zero because either the non-negative least squares regression determined 0 to be the quadratic coefficients for both segments or both segments share the same non-zero quadratic term. In either scenario, an intersection cannot be found and an optimal parameter cannot yet be estimated. The procedure under these conditions is to take another sample slightly beyond the current range of samples, as seen on lines of algorithm Additional samples will quickly eliminate scenarios with no intersections as the reconstruction is refined. Note that the value search speed is a tune-able parameter that determines how far the next parameter can be from the previous sample locations. Choosing a small value for search speed can result in slow exploration of the sample space, however, large values for search speed can lead to samples far away from the optimal value leading to expensive computation. If a I 0 and thus an intersection besides (0,1) is found, then both the left and right segments are checked to see if they contain a minimum within their region, if one does, then the location of the minimum is used as the optimal parameter, 31

38 otherwise, the intersection is used as the minimum. Checking if a segment contains the minimum does not require explicitly calculating the actual value of the function at it s minimum, only the location of the segment minimums are needed which are located at b L 2a L and b R 2a R for the left and right segments respectively. Then, if the left segment s minimum is to the left of the intersection or the right segment s minimum to the right of the intersection the appropriate location of the optimal extrapolation parameter can be determined as seen in lines of algorithm If the optimal parameter is estimated at a location that is far outside the range of where samples have been taken then the estimation may be very inaccurate. This issue is common when there are very few samples to base an estimation on. In the case that this situation does occur, lines in algorithm limit how far away the next sample will be so parameters don t travel too far into unsampled areas. Algorithm 2.3.1: Estimated Optimal Mu 1 function [mu] = EstimatedOptimalMu (X,Y, searchspeed, h a s d i v e r g e n c e ) 2 3 % Find Segments 4 Segments = FindSegments (X, Y) ; 5 6 % Find equation o f each segment 7 BL = ConstrainedRegression (X( Segments ( 1, 1 ) : Segments ( 1, 2 ) ), 8 Y( Segments ( 1, 1 ) : Segments ( 1, 2 ) ) ) ; 9 10 BR = ConstrainedRegression (X( Segments ( 2, 1 ) : Segments ( 2, 2 ) ), 11 Y( Segments ( 2, 1 ) : Segments ( 2, 2 ) ) ) ; % Find i n t e r s e c t i o n o f segments 14 a = BL( 3 ) BR( 3 ) ; 15 b = BL( 2 ) BR( 2 ) ; 16 i n t e r s e c t i o n = b/a ; % I f t h e r e i s no i t e r s e c t i o n f o r p o s i t i v e mu, or the r i g h t 19 % segment i s ontop o f the l e f t segment 20 i f ( a == 0 i n t e r s e c t i o n < 0 BL( 2 ) < BR( 2 ) ) 32

39 21 22 % I f d i v e r g e n c e has occured and the r i g h t segment i s 23 % ontop o f the l e f t segment samples should be taken from 24 % the r i g h t s i d e o f the l e f t group o f samples 25 i f ( h a s d i v e r g e n c e == 1 && BL( 2 ) < BR( 2 ) ) 26 mu = max(x( Segments ( 1, 1 ) : Segments ( 1, 2 ) ) ) (1+ s e a r c h s p e e d ) ; % Else, j u s t search beyond the maximum sample parameter 29 else 30 mu = max(x) (1 + s e a r c h s p e e d ) ; 31 end 32 else 33 % I f the l e f t segment c o n t a i n s the s p e c t r a l r a d i u s minimum 34 i f (BL( 3 ) = 0 && BL( 2 ) /(2 BL( 3 ) ) < i n t e r s e c t i o n ) 35 mu = BL( 2 ) /(2 BL( 3 ) ) ; % Else i f the r i g h t segment c o n t a i n s the s p e c t r a l r a d i u s minimum 38 e l s e i f (BR( 3 ) = 0 && BR( 2 ) /(2 BR( 3 ) ) > i n t e r s e c t i o n ) 39 mu = BR( 2 ) /(2 BR( 3 ) ) ; % Else the i n t e r s e c t i o n c o n t a i n s the s p e c t r a l r a d i u s minimum 42 else 43 mu = i n t e r s e c t i o n ; 44 end end % I f the optimal parameter i s found o u t s i d e o f the range o f 49 % the samples the sample should be l i m i t e d by the search speed 50 i f (mu < min(x) (1 s e a r c h s p e e d ) ) 51 mu = min(x) (1 s e a r c h s p e e d ) ; 52 e l s e i f (mu > max(x) (1 + s e a r c h s p e e d ) ) 53 mu = max(x) (1 + s e a r c h s p e e d ) ; 54 end 55 end Jitter Occasionally, if the estimated optimal parameter is used repeatedly for the sample location the estimation may get stuck in an inaccurate reconstruction because subsequent samples do not add enough variation to refine the estimation. To alleviate this issue a jitter term is introduced so that subsequent parameters are slightly different each time. As seen from the speedup equation in section 1.2, small changes to 33

40 the spectral radius can have large affects on the amount of computation that must be done. Therefore, jitter of the extrapolation parameter should not be based upon adding unrestricted random variation to µ but instead should be intelligently applied variation to µ based upon restrictions derived from the speedup equation. If the estimated optimal spectral radius is ρ OP T and requires n iterations to converge and the allowed jittered cost is α times the number iterations the optimal iteration requires, with α 1, then ρ nα LIMIT = ρ n OP T (2.15) and ρ 1 α OP T = ρ LIMIT. (2.16) Equation (2.16), calculated on line 10 of algorithm 2.3.2, places a limit on the highest spectral radius allowed by jitter while remaining within the computation envelope provided by α. Note that if the cost of jitter is up to 10% the iterations of the optimal iteration then α = 1.1. Next, a range for the extrapolation parameter that satisfies the limit placed on the spectral radius from jitter must be found. In the case that a segment is quadratic then it has two locations that equal the jitter limit that are located at the solutions of aµ 2 + bµ + 1 = ρ LIMIT. (2.17) 34

41 Any value of the extrapolation parameter between the two solutions to equation (2.17) will satisfy the jitter limit. Additionally, in the case of a linear segment then the solution to ρ LIMIT = bµ + 1 (2.18) will provide one limit for the extrapolation parameter that satisfies the jitter limit. The distance from this linear restriction to the optimum extrapolation parameter is mirrored to the other side of the optimum parameter so that a bounded limit is given in either case of a quadratic or linear segment. Lines of algorithm implement the calculation of range bounds for the left and right segments. Finally, as calculated on lines 45-57, with the tightest bounds on the extrapolation parameter from both the left segment and right segment that adhere to the jitter limit, a value in the range is randomly chosen as the jittered µ value. Note that the jitter limit α could be made a function of the number of samples available so that well sampled estimations can use less jitter for improved performance. Algorithm 2.3.2: Jitter Algorithm 1 function [mu] = A p p l y J i t t e r (BL,BR,mu, m a x j i t t e r c o s t ) 2 3 % Find the y value at mu ( the max o f l e f t and r i g h t segments ) 4 l e f t y = BL( 3 ) muˆ2 + BL( 2 ) mu + 1 ; 5 r i g h t y = BR( 3 ) muˆ2 + BR( 2 ) mu + 1 ; 6 7 c u r r e n t y = max( l e f t y, r i g h t y ) ; 8 9 % Maximum value the j i t t e r value should a t t a i n 10 y l i m i t = c u r r e n t y ˆ(1/ m a x j i t t e r c o s t ) ; % I f the l e f t segment i s q u a d r a t i c 13 i f (BL( 3 ) =0) 14 % Find the s o l t i o n to the q u a d r a t i c equation 35

42 15 BL left bound = BL( 2 ) sqrt (BL( 2 ) ˆ2 4 BL( 3 ) (1 y l i m i t ) ) ; 16 BL left bound = BL left bound / (2 BL( 3 ) ) ; BL right bound = BL( 2 )+sqrt (BL( 2 ) ˆ2 4 BL( 3 ) (1 y l i m i t ) ) ; 19 BL right bound = BL right bound / (2 BL( 3 ) ) ; % Else, the l e f t segment i s l i n e a r 22 else 23 j i t D i s t = abs ( ( y l i m i t 1)/BL( 2 ) mu) ; BL left bound = mu j i t D i s t ; 26 BL right bound = mu + j i t D i s t ; 27 end % I f the r i g h t segment i s q u a d r a t i c 30 i f (BR( 3 ) =0) 31 BR left bound = BR( 2 ) sqrt (BR( 2 ) ˆ2 4 BR( 3 ) (1 y l i m i t ) ) ; 32 BR left bound = BR left bound / (2 BR( 3 ) ) ; BR right bound = BR( 2 )+sqrt (BR( 2 ) ˆ2 4 BR( 3 ) (1 y l i m i t ) ) ; 35 BR right bound = BR right bound / ( 2 BR( 3 ) ) ; % Else, the r i g h t segment i s l i n e a r 38 else 39 j i t D i s t = abs ( ( y l i m i t 1)/BR( 2 ) mu) ; BR left bound = mu j i t D i s t ; 42 BR right bound = mu + j i t D i s t ; 43 end % Setup bounds on the j i t t e r ( use the t i g h t e s t bounds ) 46 l e f t b o u n d = max( BL left bound, BR left bound ) ; 47 right bound = min( BL right bound, BR right bound ) ; % Generate random value f o r j i t t e r 50 j i t t e r a m o u n t = rand ( ) ; % Apply j i t t e r randomly to l e f t or r i g h t o f mu 53 i f ( j i t t e r a m o u n t <. 5 ) 54 mu = l e f t b o u n d + (mu l e f t b o u n d ) (2 j i t t e r a m o u n t ) ; 55 else 56 mu = mu + ( right bound mu) (2 ( j i t t e r a m o u n t.5) ) ; 57 end 58 end 36

43 2.3.3 Parameter Selection The previous sections provide a method to estimate the optimal extrapolation parameter as well as a method to intelligently add variation to refine the reconstruction of the spectral radius function. Parameter selection ties these functions together with rules to manage the selection of a parameter. Algorithm will be examined which implements these rules. First, lines 4-10 declare parameters that are used in this function and are tune-able by users. Next, lines sort the samples by the x location while making sure to also track the y value accordingly. The sorted samples make further processing easier by knowing the data is in order. The first rule in parameter selection takes place on lines which returns a default initial value if no samples have been evaluated yet. This gives a starting point for the algorithm and an initial value of 1 makes the initial value equivalent to non-extrapolated iteration. Additionally, there needs to be at least 2 samples for any analysis to take place, thus, lines return a parameter slightly beyond the initial value to get additional samples. Next, lines search for divergence by checking for sample values that are very close to 1, as dictated by divergence value. If all samples are declared divergent, then lines implement a bisection of the minimum sample location in an attempt to find a parameter that will converge. Additionally, lines detect if there has been divergence, in which case it calculates x limit which will clamp subsequent estimations to the midpoint between the last convergent sample and the first divergent sample. This clamp will limit the possibility of further divergent samples. 37

44 Finally, lines trim out divergent samples so only convergent samples are used in the reconstruction of the spectral radius function. Lines find the estimated optimal µ and apply jitter. And lastly, lines apply the clamp from x limit that was calculated earlier. Algorithm 2.3.3: Parameter Selection 1 function [mu] = ParameterSelection ( X input, Y input ) 2 3 % Set parameters 4 n input = length ( X input ) ; % Number o f samples taken 5 6 s e a r c h s p e e d = 1. 1 ; % Percentage beyond sample range we 7 % can look 8 i n i t i a l v a l u e = 1 ; % Where the f i r s t guess i s 9 d i v e r g e n c e v a l u e = ; % Should be s l i g h t l y l e s s than 1 10 j i t t e r = 1. 1 ; % Should be s l i g h t l y l a r g e r than % Sort samples by X p o s i t i o n 13 [X IX ] = sort ( X input ) ; 14 Y = Y input ( IX ) ; % I f t h e r e are no samples to analyze, return i n i t i a l value 17 i f ( isempty (X) ) 18 mu = i n i t i a l v a l u e ; 19 return ; 20 end % I f t h e r e i s only one point you must g e n e r a t e another point 23 % b e f o r e a n a l y s i s can be done 24 i f ( length (X) <= 2) 25 mu = max(x) (1 s e a r c h s p e e d ) ; 26 return ; 27 end % Detect where d i v e r g e n c e occurs 30 d i v e r g e n c e l o c a t i o n = n input ; 31 for i = n input : 1:1 32 i f (Y( i ) < d i v e r g e n c e v a l u e ) 33 d i v e r g e n c e l o c a t i o n = i ; 34 break ; 35 end 38

45 36 end % I f every sample has been divergent, b i s e c t s m a l l e s t checked mu 39 i f ( d i v e r g e n c e l o c a t i o n == 1 && d i v e r g e n c e l o c a t i o n = n input ) 40 mu = min(x) / 2 ; 41 return ; 42 end % Create an upper l i m i t f o r the next parameter 45 i f ( d i v e r g e n c e l o c a t i o n < n input ) 46 h a s d i v e r g e n c e = 1 ; 47 x l i m i t = (X( d i v e r g e n c e l o c a t i o n )+ X( d i v e r g e n c e l o c a t i o n +1) ) / 2 ; 48 else 49 h a s d i v e r g e n c e = 0 ; 50 x l i m i t = Inf ; 51 end % Trim out d i v e r g e n t samples 54 X = X( 1 : d i v e r g e n c e l o c a t i o n ) ; 55 Y = Y( 1 : d i v e r g e n c e l o c a t i o n ) ; % Find the estimated optimal mu and the c o e f f o f the f u n c t i o n s 58 minimum mu = EstimatedOptimalMu (X,Y, search speed, h a s d i v e r g e n c e ) % Apply j i t t e r 61 mu = A p p l y J i t t e r (BL,BR, minimum mu, j i t t e r ) ; % Check to make sure f i n a l value isn t in an area c l o s e 64 % to divergent, i f so, clamp i t to the l i m i t 65 i f (mu > x l i m i t ) 66 mu = x l i m i t ; 67 end 68 end 2.4 Solver Integration The result of the previous sections give the function P arameterselection which abstracts all the logic of having to choose a parameter, thus, integration of the proposed algorithm into an existing solver only requires a few modifications. One of the requirements of implementing the algorithm will be the storage of previous samples, so 39

46 persistent variables have been declared and initialized in lines 9-18 which will store samples between calls to M ysolver. Line 24 contains the main iteration loop that is typical of every iterative solver, however, there is additional logic from lines to manage sample collection. The calculation of the estimated spectral radius and storage of samples occur on lines 51 and respectively. Previously, in section 1 of this chapter the number of samples required for an accurate estimation was evaluated. The conclusion was that roughly 200 iterations are required for an accurate estimation of the spectral radius. In the case that many more iterations are required to solve a system than are needed to estimate the spectral radius a new parameter can be selected whenever enough iterations have been computed for the estimation. Thus, lines 32, 38, and 46 check if a sufficient number of iterations have been computed. In the case that a divergent parameter has been used then all the work computed by this parameter is useless because it has taken the iteration farther away from the solution, thus, line 55 will revert to the value before divergent iteration occurs. The integration of the proposed algorithm into the solver and the use of persistent memory allows a user to call M ysolver with no additional knowledge or requirements about the problem. This fully abstracts the workings of the proposed algorithm making it available for general use by any user. Algorithm 2.4: Solver Integration 1 function [ x s o l v e r i t e r a t i o n ] = MySolver (A, x, b, SolverType, s o l u t i o n t o l, SOR w) 2 40

47 3 % Set parameters 4 m a x i t e r a t i o n s = 5000; 5 r e s t a r t p o s = 200; 6 r e s t a r t m i n = 150; 7 8 % Create p e r s i s t e n t ( s t a t i c ) v a r i a b l e s 9 p e r s i s t e n t mu array ; 10 p e r s i s t e n t r a t e a r r a y ; 11 p e r s i s t e n t sample ; % I n i t i a l i z e v a l u e s i f u n i t i a l i z e d 14 i f ( isempty ( mu array ) ) 15 mu array = [ ] ; 16 r a t e a r r a y = [ ] ; 17 sample = 1 ; 18 end % I n i t i a l i z e mu 21 mu = 0 ; % I t e r a t e up to m a x i t e r a t i o n s times 24 for s o l v e r i t e r a t i o n = 1 : m a x i t e r a t i o n s % C a l c u l a t e the e r r o r in the c u r r e n t s o l u t i o n 27 error norm = norm(a x b ) ; % I f s o l v e d OR t h e r e are enough i t e r a t i o n s to sample 30 i f ( error norm < s o l u t i o n t o l 31 mod( s o l v e r i t e r a t i o n, r e s t a r t p o s ) == 0 32 s o l v e r i t e r a t i o n == 1) % I f i t s not the f i r s t i t e r a t i o n 35 i f ( s o l v e r i t e r a t i o n > 1) % Find the number o f i t e r a t i o n s used f o r t h i s sample 38 i f (mod( s o l v e r i t e r a t i o n, r e s t a r t p o s ) == 0) 39 s a m p l e i t e r a t i o n s = r e s t a r t p o s ; 40 else 41 s a m p l e i t e r a t i o n s = mod( s o l v e r i t e r a t i o n, r e s t a r t p o s ) ; 42 end % Only gather as sample i f t h e r e were enough i t e r a t i o n s or you are f o r c e d to 45 i f ( s a m p l e i t e r a t i o n s > r e s t a r t m i n 46 s o l v e r i t e r a t i o n < r e s t a r t p o s ) % C a l c u l a t e e s t i m a t e f o r s p e c t r a l r a d i u s 49 e s t i m a t e d s p e c t r a l r a d i u s = (norm( xp x ) / 50 norm( sample x0 x ) ) ˆ 51 (2/( s a m p l e i t e r a t i o n s 1) ) ; 52 41

48 53 % I f NOT s o l v e d AND divergent, throw away d i v e r g e n t i t e r a t i o n s 54 i f ( error norm > s o l u t i o n t o l && e s t i m a t e d s p e c t r a l r a d i u s > ) 55 x = sample x0 ; 56 end % Make sure x0 x did not cause NAN, then record 59 i f ( isnan ( e s t i m a t e d s p e c t r a l r a d i u s ) ) 60 mu array ( sample ) = mu; 61 r a t e a r r a y ( sample ) = e s t i m a t e d s p e c t r a l r a d i u s ; 62 sample = sample + 1 ; 63 end 64 end 65 end % I f solved, q u i t i t e r a t i n g 68 i f ( error norm < s o l u t i o n t o l ) 69 break ; 70 end % S e l e c t new parameter 73 mu = ParameterSelection ( mu array, r a t e a r r a y ) ; % Record s t a r t i n g point f o r sample 76 sample x0 = x ; 77 end xp = x ; x = ComputeIteration (A, x, b ) ; % Apply e x t r a p o l a t i o n 84 x = xp + mu ( x xp ) ; 85 end 86 end 42

49 CHAPTER III PERFORMANCE ANALYSIS This chapter evaluates the performance of the proposed algorithm by randomly generating sequences of linear systems, solving them, and then recording the number of iterations required for convergence. To generate a sequence of linear systems a symmetric positive definite matrix with a randomly chosen condition number between 50 and 150 was generated for the coefficient matrix A, then, vectors with random entries between 0 and 1 were generated for b (i) to create a sequence. For each benchmark, the first 100 elements in 250 sequences of linear systems were solved. Figures 3.1a, 3.2a, and 3.3a analyze the ratio of the number of iterations required using the proposed algorithm to the number of iterations required using the optimal extrapolation parameter found through a brute force bisection method. Ratios less than or equal to 1 show that the proposed algorithm is able to accurately find the optimal extrapolation parameter. The figures show that by roughly the 30 th element in a sequence even the 95 th percentile can very accurately estimate the optimal extrapolation parameter. Note that there are some scenarios in which the proposed algorithm exceeds the performance of the theoretical optimal parameter found through brute force. The cause of this superoptimal improvement stems from the fact that the spectral radius determines the rate of convergence as the number 43

50 of iterations goes to infinity, however, iteration usually terminates after only a few hundred iterations because sufficient convergence is achieved. Because the proposed algorithm is based upon the average reduction factor and not the spectral radius, superoptimal convergence can sometimes be achieved. Figures 3.1b, 3.2b, and 3.3b analyze the ratio of the number of iterations required for the proposed algorithm to the number of iterations required for the non extrapolated method. Ratios less than 1 show that the proposed method is more computationally efficient than not using the proposed algorithm. Similarly with the previous figures, by roughly the 30 th element in the sequence even the 95 th percentile becomes steady as the estimation of the parameter becomes accurate. Figures 3.1b and 3.2b show a certain improvement with median ratios well below 1, however, figure 3.3b does not show much of an improvement with a median ratio very close to 1. The lack of improvement seen in figure 3.3b is caused by having an optimal extrapolation parameter that is very close to 1, thus, even with a very accurate estimation of the optimal parameter a large improvement would not be seen. The conclusion to be drawn from these figures is that the proposed algorithm is able to accurately estimate the optimal extrapolation parameter after solving a small number of elements in the sequence, however, whether extrapolation provides a significant speedup is dependent upon the splitting and the problem at hand. 44

51 Iterations for Proposed Algorithm / Iterations for Optimal Extrapolation th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Sequence Position (a) Quality of optimal spectral radius estimation. Iterations for Proposed Algorithm / Iterations for No Extrapolation th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Sequence Element (b) Improvement beyond non-extrapolated iteration. Figure 3.1: Performance results for Gauss-Seidel. 45

52 Iterations for Proposed Algorithm / Iterations for Optimal Extrapolation th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Sequence Position (a) Quality of optimal spectral radius estimation. Iterations for Proposed Algorithm / Iterations for No Extrapolation th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Sequence Element (b) Improvement beyond non-extrapolated iteration. Figure 3.2: Performance results for SOR w =

53 Iterations for Proposed Algorithm / Iterations for Optimal Extrapolation th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Sequence Position (a) Quality of optimal spectral radius estimation. Iterations for Proposed Algorithm / Iterations for No Extrapolation th Percentile 25th Percentile 50th Percentile 75th Percentile 95th Percentile Sequence Element (b) Improvement beyond non-extrapolated iteration. Figure 3.3: Performance results for SOR w =

54 CHAPTER IV CASE STUDY The previous chapter utilized randomly generated systems in order to analyze the performance of the proposed algorithm, however, randomly generated systems may not accurately reflect problems that occur in the real world. This chapter implements the Crank-Nicholson method for solving the diffusion equation over a 3D domain in order to analyze the performance of the proposed algorithm in a real world scenario. The partial differential equation that will be solved is U t = a(u xx + U yy + U zz ) + S(x, y, z, t) (4.1) where S(x, y, z, t) is a source term and a is a constant value for the diffusion coefficient. The Crank-Nicholson method discretizes the diffusion equation over space and time requiring a linear system to be solved at each time step, thus, generating a sequence of linear systems [14,15]. Figure 4.1 shows slices of the solved volume in this example at different time steps. When discretized, the diffusion equation becomes 48

55 (a) t =.25 (b) t = 1.25 Figure 4.1: Slices of the volume solved in the example problem at particular times. ( 1 t + a x 2 + a y 2 + a ) U n+1 z 2 i,j,k a ( U n+1 2 x 2 i 1,j,k + U ) n+1 i+1,j,k a ( U n+1 2 y 2 i,j 1,k + U ) n+1 a ( i,j+1,k U n+1 2 z 2 i,j,k 1 + U ) n+1 i,j,k+1 = (4.2) ( 1 t a x a 2 y a ) U n 2 z 2 i,j,k + a ( ) U n 2 x 2 i 1,j,k + Ui+1,j,k n + a ( ) U n 2 y 2 i,j 1,k + Ui,j+1,k n a ( ) + U n 2 z 2 i,j,k 1 + Ui,j,k+1 n + S n+ 1 2 i,j,k and the coefficient matrix A (i) is determined from the left side of equation (4.2) while the vector b (i) is determined from the right side of equation (4.2). If n x, n y, and n z are the number of grid points in each respective dimension, then the resulting coefficient matrix is of size n x n y n z n x n y n z. Doubling the resolution of the grid in 49

Lecture 18 Classical Iterative Methods

Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,