An Eæcient Tridiagonal Eigenvalue Solver. Ren-Cang Li and Huan Ren.

Size: px
Start display at page:

Download "An Eæcient Tridiagonal Eigenvalue Solver. Ren-Cang Li and Huan Ren."

Transcription

1 An Eæcient Tridiagonal Eigenvalue Solver on CM 5 with Laguerre's Iteration Ren-Cang Li and Huan Ren Department of Mathematics University of California at Berkeley Berkeley, California li@math.berkeley.edu ren@math.berkeley.edu Report No. UCBèèCSD December 1994 Computer Science Division èeecsè University of California Berkeley, California 94720

2 An Eæcient Tridiagonal Eigenvalue Solver on CM 5 with Laguerre's Iteration æ Ren-Cang Li and Huan Ren Department of Mathematics University of California at Berkeley Berkeley, California May 1992 Computer Science Division Technical Report UCBèèCSD , University of California, Berkeley, CA 94720, December, Abstract In this paper, we propose an algorithm for ænding eigenvalues of symmetric tridiagonal matrices based on Laguerre's iteration. The algorithm is fullly parallelizable and has been parallelized on CM5 at University of California at Berkeley. We've achieved best possible speedup when matrix dimension is large enough. Besides, we have a well-written serial code which works much more eæcient in pathologically close eigenvalue cases than an existing serial code of the same kind due to Li and Zeng. 1 Introduction Laguerre's iteration turns out to be one of the best iterative methods to ænd zeros of polynomials. If the polynomial has only real zeros, Laguerre's æ This material is based in part upon work supported by Argonne National Laboratory under grant No and the University of Tennessee through the Advanced Research Projects Agency under contract No. DAAL03-91-C-0047, by the National Science Foundation under grant No. ASC , and by the National Science Infrastructure grants No. CDA and CDA

3 An Eæcient Eigenvalue Solver on CM 5 2 iteration guarantees to produce a sequence of approximations which converge monotonically to the desired zero. Moreover, the convergence is ultimately cubic provided the algebraic multiplicity of the desired zero is estimated correctly. Therefore, convergence to simple zeros is always cubic and at least linearly for multiple zeros. This report addresses some implementation issues in programming Laguerre's iteration both serially and parallelly to compute eigenvalues of a symmetric tridiagonal matrix. Our contributions are æ Diæerent Divide-and-Conquer strategy which we call Cross-and-Recover in contrast with Li and Zeng's Split-and-Merge ë5ë, which oæers better inclusion intervals. æ More eæcient zeroíænder code in contrast with Li and Zeng's ë5ë. We make our zero-ænder go as fast as possible for pathological cases. æ Eæciently parallel implementation on CM 5. Speedup is almost linear. We will make our contribution more concrete later. 2 The Basic Structure of the Algorithm Let T be an n æ n tridiagonal matrix 0 T = d 1 e 1 e 1 d 2 e e n,2 d n,1 e n,1 e n,1 d n 1 ; è1è C A whose eigenvalues, ç 1 éç 2 é æææéç n ;

4 An Eæcient Eigenvalue Solver on CM 5 3 are to be found. Assume, without loss of generality, all e j 6= 0, i.e., T is unreducible. Let k = bn=2c and write T = 0 T 0 e k,1 e k,1 d k e k e k T 1 1 C A : If, now, the eigenvalues of T b, a submatrix of T with its kth row and column crossed out, è! T0 bt = T 1 are known as bç 1 ç ç b 2 çæææçç b n,1; Cauchy's interlacing theorem tells us precisely where each ç j locates, i.e., ç 1 ç b ç 1 ç ç 2 ç b ç 2 çæææç b ç n,1 ç ç n : è2è Also we have bç 1, èje k,1j + jd k j + je k jè ç ç 1 ;ç n ç b ç n,1 +èje k,1j + jd k j + je k jè è3è by Weyl-Lidskii theorem. With è2è and è3è, ç j can be computed by Laguerre's iteration applying to the characteristic polynomial of T. How dowe get ç b j? The above procedure can be applied recursively until matrices become small enough to be solved easily. Pictorially, our algorithm has the structure as shown in Figure 1, in which we assume the submatrices at the very bottom have the same size. 3 Laguerre's Iteration Let fèxè = detèt, xiè: è4è

5 An Eæcient Eigenvalue Solver on CM 5 4 Recover 8m+7 4m+3 2m+1 m Cross Figure 1: The Divide-and-Conquer Process

6 An Eæcient Eigenvalue Solver on CM 5 5 Laguerre's iteration èlæ ç L 1æè is deæned as L ræ def = x + ç, f 0 èxè f èxè s ç ç n,r æ r n èn, 1è ç, f 0 èxè f èxè ç 2 ç ç ç;, n f 00 èxè f èxè è5è where r is a parameter to be chosen ideally to accelerate convergence. For our purpose, r is always set to be 1. It is proved that If x 2 èç j ;ç j+1 è, then ç j él,èxè éxél + èxè éç j+1. ç ç To apply Laguerre's iteration, one needs to know, f 0 èxè and ç f 00 èxè f èxè f èxè It turns out that they can be computed in a recursive way due to ë5ë èref. to Algorithm DetEvlè. As a by-product, the neg count çèxè at x is computed also, which is the number of eigenvalues of T less than x. In a few cases, however, one may just need the neg count at x for decision making as we will see later. In those cases, calling DetEvl to get neg count is more expensive than necessary. So, instead, a call to Algorithm NegCount can be made. 3.1 Costs for Calling DetEvl and NegCount The following tables present operation counts for one call to DetEvl or to NegCount. Table 1: Costs +or, æ æ DetEvl 6n, 5 6èn, 1è 3n, 2 NegCount 2n, 1 n, 1 n In Table 1, it is assumed that d i, x and e 2 i,1 =ç i,1 for each i are computed once, and e 2 i is computed whenever needed. By looking at DetEvl, one can see that there are rooms for rearranging computations to minimize total arithmetic operations. Table 2: Costs èoptimizedè +or, æ æ DetEvl 6n, 5 9èn, 1è n NegCount 2n, 1 0 n ç.

7 An Eæcient Eigenvalue Solver on CM 5 6 Algorithm DetEvl: input: T, x; output:, f 0 èxè f èxè begin DetEvl = çn, f 00 èxè f èxè = ç n and neg count; ç 1 = d 1, x; if ç 1 =0then ç 1 = e 2 1æ 2 m; è* æ m is the machine epsilon *è ç 0 =0,ç 1 =1=ç 1, ç 0 =0,ç 1 =0,neg count =0; if ç 1 é 0 then neg count =1; for i =2:n ç i =èd i, xè, èe 2 i,1=ç i,1è; if ç i =0then ç i =èe 2 i,1=ç i,1èæ 2 m; if ç i é 0 then neg count = neg count +1; ç i = ç i = end for end DetEvl æ ææ èdi, xèç i,1 +1, èe 2 i,1 =çi,1èçi,2 ç æ ææ i; èdi, xèç i,1 +2ç i,1, èe 2 i,1=ç i,1èç i,2 ç i; Figure 2: Algorithm DetEvl Algorithm NegCount: input: T, x; output: neg count; begin DetEvl ç 1 = d 1, x; if ç 1 =0then ç 1 = e 2 1æ 2 m; ç 0 =0,ç 1 =1=ç 1, ç 0 =0,ç 1 =0,neg count =0; if ç 1 é 0 then neg count =1; for i =2:n ç i =èd i, xè, èe 2 i,1 =çi,1è; if ç 1 =0then ç i =èe 2 i,1=ç i,1èæ 2 m; if ç i é 0 then neg count = neg count +1; end for end NegCount Figure 3: Algorithm NegCount

8 An Eæcient Eigenvalue Solver on CM 5 7 In Table 2, it is assumed that e 2 i is pre-computed and for DetEvl each 1=ç i is computed for ç i, ç i and for 1=ç i AVery Minor Error in Li and Zeng's Error Analysis Li and Zeng ë5ë gave a detailed error analysis for DetEvl. It is correct generically, but it is not in one case. Assume we are working with the following arithmetic model flèx æ yè =èx æ yèè1 + æè; where æ is one of the operations +;,; æ and =, flèæè denotes the æoating point result of the corresponding operation, and jæj ç æ m, the machine epsilon. It was proved by Li and Zeng that flëfèxèë = flëdetèt, xièë = detèèt + Eè, xièè1 + æè; è6è where,è3n, 2èæ m + Oèæ 2 mè ç æ ç è3n, 2èæ m + Oèæ 2 mè, and e 1 æ 1. e E = E LZ = 1 æ C en,1æ n,1 A e n,1æ n,1 0 with,2:5æ m + Oèæ 2 mè ç æ j ç 2:5æ m + Oèæ 2 mè. We claim this error estimate is incorrect when d 1, x =0. To see why, consider a T with all d j =0,nodd and x =0. Were their error analysis correct, the flëdetèt èë produced by DetEvl would be zero. However, DetEvl never yields zero determinants! The error was made because of ignoring the case d 1, x = 0. The correct E for the case d 1, x =0is 0 1 e 2 1æ 2 m e 1 æ 1. e E = 1 æ C en,1æ n,1 A e n,1æ n,1 0

9 An Eæcient Eigenvalue Solver on CM The Basic Building Block The kernel of our algorithm is to compute the eigenvalue ç j of T inside an prescribed interval ëa; bë whose endpoints a and b are, initially, the eigenvalues of its two principal submatrices as we saw before and are updated as iteration goes on. Thus ç j is the only eigenvalue of T inside ëa; bë. To avoid some unpleasant situations, our code does the following at the beginning.

10 An Eæcient Eigenvalue Solver on CM Preprocessing max oæ := max èje ij + je i+1jè;, 1çiçn,2 æ tol := 2:5 æ max oæ + æ a+b 2 æ æ æ æ m; if b, a ç 2 æ tol return ç j := a+b 2 ; call NegCount at a+b if ç, a+b 2 æ = j then b := a+b 2 else è* ç, a+b 2 a := a+b 2 2 ; ; æleft half := true; éj*è ; left half := false; if left half = true then tol := è2:5 æ max oæ + jajè æ æ m; x 0 := a +2æ tol; call DetEvl at x 0; if else çèx 0è=j, then return ç j := L,èx 0è; x 1 := L +èx 0è; else è* left half = false *è tol := è2:5 æ max oæ + jbjè æ æ m; x 0 := b, 2 æ tol; call DetEvl at x 0; if çèx 0è éj, then return ç j := L +èx 0è; else x 1 := L,èx 0è; tol := è2:5 æ max oæ + jx 1jè æ æ m; if jx 1, x 0jçtol, then if left half = true then call NegCount at x 1 + tol; if çèx 1 + tolè =j return ç j := x 1 + tol; else è* left half = false *è call NegCount at x 1, tol; if çèx 1, tolè éj return ç j := x 1, tol; æ 1 := èb, aè 4p æ m; æ 2 := èb, aè p æ m; æ 3 := èb, aè 4p æ 3 m; if jx 1, x 0j éæ 1, go to è*è;

11 An Eæcient Eigenvalue Solver on CM 5 10 if jx 1, x 0jçæ 3, then if left half = true then call NegCount at x 1 + æ 3; if çèx 1 + æ 3è=j then, b := x 1 + æ 3; x 2 := x 1 + æ 3=10; go to è*è; else è* left half = false *è call NegCount at x 1, æ 3; if çèx 1, æ 3è éjthen, a := x 1, æ 3; x 2 := x 1, æ 3=10; go to è*è; if jx 1, x 0jçæ 2, then if left half = true then call NegCount at x 1 + æ 2; if çèx 1 + æ 2è=j then, b := x 1 + æ 2; x 2 := x 1 + æ 2=10; go to è*è; else è* left half = false *è call NegCount at x 1, æ 2; if çèx 1, æ 2è éjthen, a := x 1, æ 2; x 2 := x 1, æ 2=10; go to è*è; if jx 1, x 0jçæ 1, then if left half = true then call NegCount at x 1 + æ 1; if çèx 1 + æ 1è=j then, b := x 1 + æ 1; x 2 := x 1 + æ 1=10; go to è*è; else x 2 := x 1 + æ 1; a := x 2; else è* left half = false *è call NegCount at x 1, æ 1; if çèx 1, æ 1è éjthen, a := x 1, æ 1; x 2 := x 1, æ 1=10; go to è*è; else x 2 := x 1, æ 1; b := x 2; è*è continue... è* Loop Body for Laguerre's Iteration *è

12 An Eæcient Eigenvalue Solver on CM 5 11 Although, Laguerre's iteration has very nice convergence properties in applying to unreducible tridiagonal symmetric matrix T, there are two diæcult situations which need to be handled carefully in order to make iterations go as fast as it should. 4.2 Two diæcult Situations: Detection and Solution Suppose we are at x on the way toç. For the sake ofconvenience, let's assume x = a. The following two situations are diæcult ones. Case èiè: On the left of x, T has an eigenvalue which is terribly close to x, while x and ç are reasonably apart; Case èiiè: T has a few other eigenvalues on the right ofç and those eigenvalues look almost the same as ç from the view at x; orx is outside the region of cubic convergence of Laguerre's iteration for ænding ç. It is possible that two diæcult situations happen at the same time. Case èiè is pointed by Kahan ë4ë. Both ë4ë and ë5ë mention Case èiiè. A popular treatment to Case èiiè is to estimate the ënumerical" algebraic multiplicity of ç. However, the multiplicity could not be estimated correctly until x is suæciently close to x. Proposed here, are eæectively numerical detections of the two cases and their possible solutions. Let x i = L + èx i,1è èx 0 = xè. The Case èiè can be easily detected by looking at èx 2, x 1 è=èx 1, x 0 è. The situation falls into Case èiè if the number is bigger than 1. One solution to Case èiè is to restart at new x =èx 2 + bè=2. It turns out this new x might be too farther to ç than the old x. Fortunately, our preprocessing described above prevents this from happening. The Case èiiè: To detect Case èiiè, again one only needs to look at èx 2, x 1 è=èx 1, x 0 è. The situation falls into Case èiiè if it is less than 1 but not so small to be thought that iterations will go at least superlinearly towards ç for the next few iterations. For an instance, use the following

13 An Eæcient Eigenvalue Solver on CM 5 12 criterion: 1 5 ç q def = x 2, x 1 é 1: è7è x 1, x 0 As we mention above, one could then use L ræ for r other than 1 to accelerate iterations. However, the right r is not so easy to get at least at the beginning. The following method works quite well. As a matter of fact, once we detect that è7è holds, we would expect iterations x i = L + èx i,1è would continue to go for another few iterations at a similar rate, i.e., for the next few iterations, roughly, Note x i+1, x i x i, x i,1 ç q: ç = x 1 +èx 2, x 1 è+èx 3, x 2 è+æææ+èx i+1, x i è+æææ: Since Laguerre's iteration goes monotonically upwards for the precent case, each x i+1, x i in the summation is positive. Suppose the iterations x i = L + èx i,1è would go linearly at the rate about q, sayuntil x k+1, and then begin superlinear ècubicè convergence. èk could be 1.è Therefore which yields x i+1, x i x 2, x 1 ç q i,1 ; ç ç x 1 +èx 2, x 1 èè1 + q + æææ+ q k,1 è + ènegligible é 0è = x 1 +èx 2, x 1 è 1, qk 1, q + ènegligible é 0è; which means if we know k, we could make x 1 +èx 2, x 1 èè1 + q + æææ+ q k,1 è be the next approximation which should be much more accurate than x 3. A very simple way to ænd k is to try k =2`, where ` =1; 2; æææ, 1, sequentially, by calling NegCount. ` should be made as big as possible while maintaining x 1 +èx 2, x 1 èè1 + q + æææ+ q k,1 è still on the left of ç. One restriction on ` should be q k ç æ m, which gives, roughly, ç ç 3:6, lnè, ln qè ` ç 0:69

14 An Eæcient Eigenvalue Solver on CM 5 13 for double precision. According to our experience, ` bigger than 6 could be set to 1. Roughly speaking, 2` calls to DetEvl are saved by ` + 1 calls to Neg- Count. 5 Numerical Tests for Serial Implementation We implement our algorithm based on what we have discussed both serially and parallelly. In this section we are going to present some numerical results on how eæcient our serial implementation is in comparison with Li and Zeng's and DSTEBZ, the bisection routine in LAPACK for computing eigenvalues only. For reference, we also list times taken by DSTEIN, the inverse iteration routine in LAPACK for computing eigenvectors after calling DSTEBZ. Our code was written in C. Li and Zeng's code, and routines from LA- PACK as well, were written in Fortran. Our ærst example is glued Wilkinson matrices which have many pathologically close eigenvalues. To be more speciæc, the tridiagonal matrices are of form 0 T = 1 æ æ W + 21 æ. æ..... ; æ C æ W + 21 æ A æ W + 21 W + 21 where W + 21 is the well-known Wilkinson matrix whose diagonal entries are 10; 9; æææ; 1; 0; 1; æææ; 9; 10 and oæ-diagonal entries all 1, æ a parameter. Bigger testing matrices can be obtained by gluing more W + 21.

15 An Eæcient Eigenvalue Solver on CM 5 14 Table 3: Timing for Glued Wilkinson Matrices with æ =10,5 n Ours Li & Zeng DSTEBZ DSTEIN Table 4: Timing for Glued Wilkinson Matrices with æ =10,10 n Ours Li & Zeng DSTEBZ DSTEIN Table 3 and Table 4 indicates that our code handles pathologically close eigenvalue cases much better than Li and Zeng's. For æ =10,5, our code runs a little bit faster than DSTEBZ. However, when æ =10,10, DSTEBZ got more potential as n goes larger and larger because eigenvalues are extremely clustered together. For random testing matrices, our code takes comparable times as Li and Zeng's code does as the following table shows. On the other hand, DSTEBZ is very slow. Table 5: Timing for Random Matrices n Ours Li & Zeng DSTEBZ DSTEIN To make Tables 3 to 5 more visible, we plot Tables 3 and 4 into Figure 4. Figure 5 plots Table 5. 6 Parallel Implementation on CM5 We code our parallel version of the algorithm in Split-C. For the sake of convenience of our presentation, the notation T èi : jè denotes the principle

16 An Eæcient Eigenvalue Solver on CM Timing (in seconds) (a) Ours Li & Zeng DSTEBZ Timing (in seconds) (b) Ours Li & Zeng DSTEBZ Dimension of Matrices Dimension of Matrices Figure 4: Glued Wilkinson Matrices: èaè æ =10,5, èbè æ =10, Timing (in seconds) Ours Li & Zeng DSTEBZ Dimension of Matrices Figure 5: Random Matrices

17 An Eæcient Eigenvalue Solver on CM 5 16 submatrix consisting the intersection of rows i to j and columns i to j, i.e., 0 1 T èi : jè def = d i e i e i d i+1 e i Outline of our Implementation e j,2 d j,1 e j,1 e j,1 d j Let PROCS be the number of processors. It is a power of 2 èprocs =2 5 =32 for rodin and PROCS =2 6 = 64 for rodin3è. Let's begin with an example: n = 50, PROCS =8. decoupled into T è1 : 6è; T è8 : 13è; T è15 : 20è; T è22 : 26è; T è28 : 32è; T è34 : 38è; T è40 : 44è; T è46 : 50è: : C A First of all, T is Each of those small decoupled problems then can be solved serially by the 8 processors each of which takes care of one of them in order. We call this Step 0. Step 1 is for T è1 : 13è, which will activate communications between Processor 0 and Processor 1, and for T è15 : 26è, which will activate communications between Processor 2 and Processor 3, and so on. After Step 1, Step 2 is reached. All eigenvalues of T are computed after the completion of Step 3. Figure 6 gives a bird-view of how our algorithm goes on the example, where i : j indicates a submatrix T èi : jè on which one or more processors are working. Assume PROCS =2 p. Generally, our implementation æow is as described in Figure Implementation Details Subtle Issues to Be Handled with Care To implement our algorithm on CM5 eæciently turns out not to be an easy task. There are two major issues, among others, we have to deal with in

18 An Eæcient Eigenvalue Solver on CM 5 17 Step 3 1:50, The Whole Matrix Step 2 1 : : 50 Step 1 1 : : 26 28:38 40:50 Step 0 1:6 8:13 15:20 22:26 28:32 34:38 40:44 46: Figure 6: Algorithm Flow: An Example

19 An Eæcient Eigenvalue Solver on CM 5 18 j k n,èprocs,1è q = ; r =ën, PROCS èprocs, 1èë, q æ PROCS; Step 0: Processor i solves the eigenvalues of T èi start : i end è serially, where if r =0then i start := ièq +1è+1, i end := ièq +1è+q; else è* r ç 1 *è for 0 ç i ç r, 1, i start := ièq +2è+1 and i end := ièq +2è+q +1; end for for r ç i ç PROCS, 1, i start := r + ièq +1è+1,and i end := r + ièq +1è+q; end for end of Step 0 ææææææææææææææææææ Step j: for i =0:è2 p,j, 1è, Processors i2 j to èi + 1è2 j collaborate together to solve the eigenvalues of T èèi2 j è start :èèi + 1è2 j è end è; end for end of Step j ææææææææææææææææææ Step p: All processors collaborate together to solve the eigenvalues of T ; end of Step p Figure 7: Algorithm Flow

20 An Eæcient Eigenvalue Solver on CM 5 19 order to achieve the best performance on CM 5 as we should. æ Minimizing Communications The ærst few things come to our mind is to minimize the number of messages to be communicated between processors, to maintain communication locally if possible, and to use fast communication tool èbulk-getè always; We do let every node processor do some redundant operations at the same time as tradeoæ to reduce the number of messages, which turns out to be a good thing to do and aæects negligibly on ænal speedups. æ Load-Balancing Load-balancing is another important issue which has heavy inæuence on ænal speedups if it is handled poorly. We will be more concrete about on how we distribute work among processors so that each processor would have roughly the same amount ofwork to ænish with Memory Requirements Since we are solving a symmetric tridiagonal matrix, the memory we need to store the matrix is 2n, 1. To reduce communication, we let every node processor has a copy of the matrix. A global spreader array glambda is created across processors for the purpose of communication between processors and has size n in each node. Besides all this, a work space of size n in each node is needed. For work distribution, an array of length at most n of structures, each of which takes 2 double æoating numbers and one integer, is also needed. Totally, about 6:5n size of double precision æoating number space is used in each node processor Communication Details Communications come in after the completion of each step and before entering the next step èref. to Figures 6 and 7è. We illustrate our points by the same example in Figure 6.

21 An Eæcient Eigenvalue Solver on CM 5 20 Step 1 1:6 8:13 15:20 22:26 28:32 34:38 40:44 46:50 Merge and Sort Step 0 Figure 8: Communications: After Step 0 and before Step 1 After Step 0 and before Step 1: There is only one level of communications to deal with. As this stage, Processor 0 owns èusually in ascending orderè the eigenvalues of T è1 : 6è, and Processor 1 owns those of T è8 : 13è. In order for the eigenvalues of T è1 : 13è to be computed in Step 1 by the two processors together, both processors have toknow the eigenvalues of T è1: 6è and of T è8 : 13è, which serve as separators for the eigenvalues of T è1 : 13è. So what we doisëprocessor 0 get the eigenvalues of T è8 : 13è from Processor 1, while at the same time Processor 1 gets those of T è1 : 6è, and then let both processors put all eigenvalues together and do sorting by Merge Sort. èhence, one of the two is doing redundant work!è" After sorting, Processor 0 will decide its responsibility to ænd how many ærst few èroughly half, generallyè, eigenvalues of T è1 : 13è, while Processor 1 will do the rest. Work distribution strategy will be discussed later. Similarly, Processor 2 to Processor 7 do their own part. Figure 8 displays what we just said, in which a sign $ indicates communications between the corresponding two processors. After Step 1 and before Step 2: Two levels of communications are involved. As this stage, Processor 0 owns the ærst part of the spectrum of T è1 : 13è, while Processor 1 owns the rest; Processor 2 owns the ærst part of the spectrum of T è15 : 26è, while Processor 3 own the rest. In order for the eigenvalues of T è1 : 26è to be computed in Step 2 by the four processors

22 An Eæcient Eigenvalue Solver on CM 5 21 Step 2 Level 2 Merge and Sort Level 1 Merge Step 1 Figure 9: Communications: After Step 1 and before Step 2 together, they have to know the eigenvalues of T è1 : 13è and of T è15 : 26è, which, after sorted, serve as separators for the eigenvalues of T è1 : 26è. To this end, we do communications as shown by Figure 9. After Level 1, Processor 0 and Processor 1 both own the entire eigenvalues in proper order of T è1 : 13è, while Processor 2 and Processor 3 both own the entire eigenvalues in proper order of T è15 : 26è; after Level 2, each of the four processors has all eigenvalues èsortedè of T è1 : 13è and of T è15 : 26è. After Step 2 and before Step 3: Three levels of communications are involved. As this stage, Processor 0 to Processor 3 together own the eigenvalues of T è1 : 26è, and each of the four processors owns a consecutive part. Also Processor 4 to Processor 7 together own the eigenvalues of T è28 : 50è, and again each of them owns a consecutive part. In order for the eigenvalues of T to be computed in Step 3 by all processors together, they have toknow the eigenvalues of T è1 : 26è and of T è28 : 50è, which, after sorted, serve as separators for the eigenvalues of T.To this end, we do communications as

23 An Eæcient Eigenvalue Solver on CM 5 22 Step 3 Level 3 Merge and Sort Level 2 Merge Level 1 Merge Step 2 Figure 10: Communications: After Step 2 and before Step 3 shown by Figure 10. After Level 2, each ofprocessor 0 to Processor 3 owns the entire èorderedè eigenvalues of T è1 : 26è, while each ofprocessor 4 and Processor 7 owns the entire èorderedè eigenvalues of T è28 : 50è; after Level 3, each of the eight processors has all eigenvalues of T è1 : 26è and of T è28 : 50è. Generally, there are j levels of communications after Step j, 1 and before Step j. At each level, all messages can be passed simultaneously as all processors are connected by fat tree wires. Each message, hopefully, is

24 An Eæcient Eigenvalue Solver on CM 5 23 of equal length. Thus there would be no processors ænish message passing earlier than any others and become idle Task Distribution Right before each step and after the last level of the very last communication before entering the next step, each processor owns two sets of eigenvalues each of which is in proper order, and they together form separators for the eigenvalues of the current èsubèmatrix to be found by relevant processors in the coming step. Before getting into the step, there are two things to be done: 1. Sort the two sets of eigenvalues into a set of numbers in proper order; We use Merge-Sort to fulæll this task since eigenvalues in each set are already in proper order. Each relevant processors do the same sorting actually. Those redundant sortings do not aæects overall speedups as they are cost marginally. 2. Tell cheaply which part of the spectrum of the current èsubèmatrix each relevant processors are going to compute while maintaining loadbalancing. To deal with the second one, we again let every relevant processor does the same thing in trading for less communications. After each processor ænishes sorting, we let them scan the sorted eigenvalues which are, as a matter of fact, separators for the eigenvalues to be found, and do the following

25 An Eæcient Eigenvalue Solver on CM 5 24 If two consecutive separators are close enough, an new eigenvalue is found instantly. Put the eigenvalue into the corresponding position in glambda; otherwise an entry of an array ofstructure is created, where the structure has form New f double left; double right; g int index; Here left is the left end point of the interval contains the eigenvalue we want to compute and right is the right end point, and index indicates which eigenvalue to be computed. At the end, every relevant processor knows how many structures have been created, which, in turn, gives the number of eigenvalues needed to be computed by Laguerre iterations. At this point, it's easy to distribute the work evenly among relevant processors without introducing further communications. Doing so makes every processor compute roughly the same number of eigenvalues, and hopefully keeps load-balancing. 7 Numerical Tests on CM 5 Our parallel implementation on CM 5 is a great success in sense that the best speedup is achieved on all kinds of matrices wehave tested. Table 6 lists times consumed on matrices whose eigenvalues are well-separated, where p denotes the actual number of processors participating computations. To be more speciæc, the diagonal entries of the matrices are 1; 2; æææ;n and oæ-diagonal entries 1.

26 An Eæcient Eigenvalue Solver on CM 5 25 Table 6: Timing for Matrices with Well-Separated Eigenvalues n p =1 p =2 p =4 p =8 p =16 p =32 p = Table 7 displays times taken on tridiagonal matrices whose entries are uniformly distributed on ë,1; 1ë subject to symmetry. Table 7: Timing for Random Matrices n p =1 p =2 p =4 p =8 p =16 p =32 p = To see how much speedup we have gotten, we plot two ægures showing speedups hiding in Table 6 and Table 7. always an interesting test matrices. eæcient our implementation is on this kind of matrices. Glued Wilkinson matrices are Table 8 and Figure 13 display how Table 8: Timing for Glued Wilkinson Matrices n p =1 p =2 p =4 p =8 p =16 p =32 p = References ë1ë Cuppen, J. J. M., A divide and conquer method for the symmetric tridiagonal eigenproblem, Numer. Math., 36è1981è, 177í195.

27 An Eæcient Eigenvalue Solver on CM n=8192 n=4096 Speedups n= n=1024 n=512 n= Numbers of Processors Figure 11: Speedups for Matrices with Well-Separated Eigenvalues n=8192 n=4096 Speedups n=2048 n=1024 n= n= Numbers of Processors Figure 12: Speedups for Random Matrices

28 An Eæcient Eigenvalue Solver on CM Speedups n=8400 n=4200 n=2100 n= Numbers of Processors Figure 13: Speedups for Glued Wilkinson Matrices ë2ë J. Demmel, M. T. Heath, and H. A. van der Vorst, Parallel numerical linear algebra, Acta Numerica, ë3ë G. H. Golub, and Ch. F. van Loan, Matrix Computations, The Jhons Hopkins University Press, 2nd edition, ë4ë W. Kahan, Notes on Laguerre's iteration, 1992 ë5ë T. Y. Li and Zhonggang Zeng, Laguerre's iteration in solving the symmetric tridiagonal eigenproblem a revisit, preprint, ë6ë B. Parlett, Laguerre's method applied to the matrix eigenvalue problem, Math. Comp., 18è1964è, 464í485.

LECTURE 17. Algorithms for Polynomial Interpolation

LECTURE 17. Algorithms for Polynomial Interpolation LECTURE 7 Algorithms for Polynomial Interpolation We have thus far three algorithms for determining the polynomial interpolation of a given set of data.. Brute Force Method. Set and solve the following

More information

the Unitary Polar Factor æ Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 94720

the Unitary Polar Factor æ Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 94720 Relative Perturbation Bounds for the Unitary Polar Factor Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 9470 èli@math.berkeley.eduè July 5, 994 Computer

More information

the Unitary Polar Factor æ Ren-Cang Li P.O. Box 2008, Bldg 6012

the Unitary Polar Factor æ Ren-Cang Li P.O. Box 2008, Bldg 6012 Relative Perturbation Bounds for the Unitary Polar actor Ren-Cang Li Mathematical Science Section Oak Ridge National Laboratory P.O. Box 2008, Bldg 602 Oak Ridge, TN 3783-6367 èli@msr.epm.ornl.govè LAPACK

More information

The geometric mean algorithm

The geometric mean algorithm The geometric mean algorithm Rui Ralha Centro de Matemática Universidade do Minho 4710-057 Braga, Portugal email: r ralha@math.uminho.pt Abstract Bisection (of a real interval) is a well known algorithm

More information

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl In: Structural and Multidisciplinary Optimization, N. Olhoæ and G. I. N. Rozvany eds, Pergamon, 1995, 543í548. BOUNDS FOR DETECTABILITY OF MATERIAL'S DAMAGE BY NOISY ELECTRICAL MEASUREMENTS Elena CHERKAEVA

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Algorithms Notes for 2016-10-31 There are several flavors of symmetric eigenvalue solvers for which there is no equivalent (stable) nonsymmetric solver. We discuss four algorithmic ideas: the workhorse

More information

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information

More information

LECTURE Review. In this lecture we shall study the errors and stability properties for numerical solutions of initial value.

LECTURE Review. In this lecture we shall study the errors and stability properties for numerical solutions of initial value. LECTURE 24 Error Analysis for Multi-step Methods 1. Review In this lecture we shall study the errors and stability properties for numerical solutions of initial value problems of the form è24.1è dx = fèt;

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

The Fast Fourier Transform

The Fast Fourier Transform The Fast Fourier Transform 1 Motivation: digital signal processing The fast Fourier transform (FFT) is the workhorse of digital signal processing To understand how it is used, consider any signal: any

More information

MTH 2032 Semester II

MTH 2032 Semester II MTH 232 Semester II 2-2 Linear Algebra Reference Notes Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2 ii Contents Table of Contents

More information

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices.

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices. Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices. A.M. Matsekh E.P. Shurina 1 Introduction We present a hybrid scheme for computing singular vectors

More information

CS 310 Advanced Data Structures and Algorithms

CS 310 Advanced Data Structures and Algorithms CS 310 Advanced Data Structures and Algorithms Runtime Analysis May 31, 2017 Tong Wang UMass Boston CS 310 May 31, 2017 1 / 37 Topics Weiss chapter 5 What is algorithm analysis Big O, big, big notations

More information

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane. Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 3 Lecture 3 3.1 General remarks March 4, 2018 This

More information

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

More information

SPRING Final Exam. May 5, 1999

SPRING Final Exam. May 5, 1999 IE 230: PROBABILITY AND STATISTICS IN ENGINEERING SPRING 1999 Final Exam May 5, 1999 NAME: Show all your work. Make sure that any notation you use is clear and well deæned. For example, if you use a new

More information

Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers

Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers José M. Badía * and Antonio M. Vidal * Departamento de Sistemas Informáticos y Computación Universidad Politécnica

More information

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing Lecture 1 Introduction Karl Henrik Johansson The importance, ubiquity, and complexity of embedded systems are growing tremendously thanks to the revolution in digital technology. This has created a need

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E. 3.3 Diagonalization Let A = 4. Then and are eigenvectors of A, with corresponding eigenvalues 2 and 6 respectively (check). This means 4 = 2, 4 = 6. 2 2 2 2 Thus 4 = 2 2 6 2 = 2 6 4 2 We have 4 = 2 0 0

More information

Getting Started with Communications Engineering

Getting Started with Communications Engineering 1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

Direct methods for symmetric eigenvalue problems

Direct methods for symmetric eigenvalue problems Direct methods for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 4, 2008 1 Theoretical background Posing the question Perturbation theory

More information

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY ILSE C.F. IPSEN Abstract. Absolute and relative perturbation bounds for Ritz values of complex square matrices are presented. The bounds exploit quasi-sparsity

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

The restarted QR-algorithm for eigenvalue computation of structured matrices

The restarted QR-algorithm for eigenvalue computation of structured matrices Journal of Computational and Applied Mathematics 149 (2002) 415 422 www.elsevier.com/locate/cam The restarted QR-algorithm for eigenvalue computation of structured matrices Daniela Calvetti a; 1, Sun-Mi

More information

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions Math 308 Midterm Answers and Comments July 18, 2011 Part A. Short answer questions (1) Compute the determinant of the matrix a 3 3 1 1 2. 1 a 3 The determinant is 2a 2 12. Comments: Everyone seemed to

More information

COMP 355 Advanced Algorithms

COMP 355 Advanced Algorithms COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background 1 Polynomial Running Time Brute force. For many non-trivial problems, there is a natural brute force search algorithm that

More information

Department of. Computer Science. Functional Implementations of. Eigensolver. December 15, Colorado State University

Department of. Computer Science. Functional Implementations of. Eigensolver. December 15, Colorado State University Department of Computer Science Analysis of Non-Strict Functional Implementations of the Dongarra-Sorensen Eigensolver S. Sur and W. Bohm Technical Report CS-9- December, 99 Colorado State University Analysis

More information

A Divide-and-Conquer Algorithm for Functions of Triangular Matrices

A Divide-and-Conquer Algorithm for Functions of Triangular Matrices A Divide-and-Conquer Algorithm for Functions of Triangular Matrices Ç. K. Koç Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report, June 1996 Abstract We propose

More information

Eæcient Computation of the Singular Value. September 29, Abstract. We present a new algorithm for computing the singular value decomposition

Eæcient Computation of the Singular Value. September 29, Abstract. We present a new algorithm for computing the singular value decomposition Eæcient Computation of the Singular Value Decomposition with Applications to Least Squares Problems Ming Gu æ James Demmel y Inderjit Dhillon z September 29, 1994 Abstract We present a new algorithm for

More information

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d CS161, Lecture 4 Median, Selection, and the Substitution Method Scribe: Albert Chen and Juliana Cook (2015), Sam Kim (2016), Gregory Valiant (2017) Date: January 23, 2017 1 Introduction Last lecture, we

More information

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3 Linear Algebra Row Reduced Echelon Form Techniques for solving systems of linear equations lie at the heart of linear algebra. In high school we learn to solve systems with or variables using elimination

More information

Section 1.1 Algorithms. Key terms: Algorithm definition. Example from Trapezoidal Rule Outline of corresponding algorithm Absolute Error

Section 1.1 Algorithms. Key terms: Algorithm definition. Example from Trapezoidal Rule Outline of corresponding algorithm Absolute Error Section 1.1 Algorithms Key terms: Algorithm definition Example from Trapezoidal Rule Outline of corresponding algorithm Absolute Error Approximating square roots Iterative method Diagram of a general iterative

More information

Parallelism in Structured Newton Computations

Parallelism in Structured Newton Computations Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca

More information

COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background

COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background 1 Polynomial Time Brute force. For many non-trivial problems, there is a natural brute force search algorithm that checks every

More information

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an Learning Canonical Correlations Hans Knutsson Magnus Borga Tomas Landelius knutte@isy.liu.se magnus@isy.liu.se tc@isy.liu.se Computer Vision Laboratory Department of Electrical Engineering Linkíoping University,

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 17 1 / 26 Overview

More information

We have already seen that the main problem of linear algebra is solving systems of linear equations.

We have already seen that the main problem of linear algebra is solving systems of linear equations. Notes on Eigenvalues and Eigenvectors by Arunas Rudvalis Definition 1: Given a linear transformation T : R n R n a non-zero vector v in R n is called an eigenvector of T if Tv = λv for some real number

More information

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes Computational Complexity Lecture 02 - Basic Complexity Analysis Tom Kelsey & Susmit Sarkar School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ twk@st-andrews.ac.uk

More information

Finite-choice algorithm optimization in Conjugate Gradients

Finite-choice algorithm optimization in Conjugate Gradients Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the

More information

Algebraic Equations. 2.0 Introduction. Nonsingular versus Singular Sets of Equations. A set of linear algebraic equations looks like this:

Algebraic Equations. 2.0 Introduction. Nonsingular versus Singular Sets of Equations. A set of linear algebraic equations looks like this: Chapter 2. 2.0 Introduction Solution of Linear Algebraic Equations A set of linear algebraic equations looks like this: a 11 x 1 + a 12 x 2 + a 13 x 3 + +a 1N x N =b 1 a 21 x 1 + a 22 x 2 + a 23 x 3 +

More information

CS173 Running Time and Big-O. Tandy Warnow

CS173 Running Time and Big-O. Tandy Warnow CS173 Running Time and Big-O Tandy Warnow CS 173 Running Times and Big-O analysis Tandy Warnow Today s material We will cover: Running time analysis Review of running time analysis of Bubblesort Review

More information

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky What follows is Vladimir Voevodsky s snapshot of his Fields Medal work on motivic homotopy, plus a little philosophy and from my point of view the main fun of doing mathematics Voevodsky (2002). Voevodsky

More information

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.] Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the

More information

Krylov Subspaces. Lab 1. The Arnoldi Iteration

Krylov Subspaces. Lab 1. The Arnoldi Iteration Lab 1 Krylov Subspaces Lab Objective: Discuss simple Krylov Subspace Methods for finding eigenvalues and show some interesting applications. One of the biggest difficulties in computational linear algebra

More information

Figure 1: Startup screen for Interactive Physics Getting Started The ærst simulation will be very simple. With the mouse, select the circle picture on

Figure 1: Startup screen for Interactive Physics Getting Started The ærst simulation will be very simple. With the mouse, select the circle picture on Experiment Simulations í Kinematics, Collisions and Simple Harmonic Motion Introduction Each of the other experiments you perform in this laboratory involve a physical apparatus which you use to make measurements

More information

What is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A =

What is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A = STUDENT S COMPANIONS IN BASIC MATH: THE ELEVENTH Matrix Reloaded by Block Buster Presumably you know the first part of matrix story, including its basic operations (addition and multiplication) and row

More information

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem Bor Plestenjak Department of Mathematics University of Ljubljana Slovenia Ellen Van Camp and Marc Van

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2 CMPSCI611: Three Divide-and-Conquer Examples Lecture 2 Last lecture we presented and analyzed Mergesort, a simple divide-and-conquer algorithm. We then stated and proved the Master Theorem, which gives

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem.

O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem. O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem. defn: v(n) = O(f(n)) constants c and n 0 such that v(n) c f(n) whenever n > n 0 Termed complexity:

More information

Matrices. 1 a a2 1 b b 2 1 c c π

Matrices. 1 a a2 1 b b 2 1 c c π Matrices 2-3-207 A matrix is a rectangular array of numbers: 2 π 4 37 42 0 3 a a2 b b 2 c c 2 Actually, the entries can be more general than numbers, but you can think of the entries as numbers to start

More information

4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1

4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1 4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1 Two vectors are orthogonal when their dot product is zero: v w = orv T w =. This chapter moves up a level, from orthogonal vectors to orthogonal

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

The Godunov Inverse Iteration: A Fast and Accurate Solution to the Symmetric Tridiagonal Eigenvalue Problem

The Godunov Inverse Iteration: A Fast and Accurate Solution to the Symmetric Tridiagonal Eigenvalue Problem The Godunov Inverse Iteration: A Fast and Accurate Solution to the Symmetric Tridiagonal Eigenvalue Problem Anna M. Matsekh a,1 a Institute of Computational Technologies, Siberian Branch of the Russian

More information

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices*

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* Fredrik Manne Department of Informatics, University of Bergen, N-5020 Bergen, Norway Fredrik. Manne@ii. uib. no

More information

Linear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway

Linear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway Linear Algebra: Lecture Notes Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway November 6, 23 Contents Systems of Linear Equations 2 Introduction 2 2 Elementary Row

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Linear algebra and differential equations (Math 54): Lecture 13

Linear algebra and differential equations (Math 54): Lecture 13 Linear algebra and differential equations (Math 54): Lecture 13 Vivek Shende March 3, 016 Hello and welcome to class! Last time We discussed change of basis. This time We will introduce the notions of

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively. Combine solutions to sub-problems into overall solution. Most common

More information

University of California, Berkeley. ABSTRACT

University of California, Berkeley. ABSTRACT Jagged Bite Problem NP-Complete Construction Kris Hildrum and Megan Thomas Computer Science Division University of California, Berkeley fhildrum,mctg@cs.berkeley.edu ABSTRACT Ahyper-rectangle is commonly

More information

REVIEW FOR EXAM II. The exam covers sections , the part of 3.7 on Markov chains, and

REVIEW FOR EXAM II. The exam covers sections , the part of 3.7 on Markov chains, and REVIEW FOR EXAM II The exam covers sections 3.4 3.6, the part of 3.7 on Markov chains, and 4.1 4.3. 1. The LU factorization: An n n matrix A has an LU factorization if A = LU, where L is lower triangular

More information

Determinants in detail

Determinants in detail Determinants in detail Kyle Miller September 27, 2016 The book s introduction to the determinant det A of an n n square matrix A is to say there is a quantity which determines exactly when A is invertible,

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101 A design paradigm Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/17 112 Multiplying complex numbers (from Jeff Edmonds slides) INPUT: Two pairs of integers, (a,b),

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Chapter 3: Root Finding. September 26, 2005

Chapter 3: Root Finding. September 26, 2005 Chapter 3: Root Finding September 26, 2005 Outline 1 Root Finding 2 3.1 The Bisection Method 3 3.2 Newton s Method: Derivation and Examples 4 3.3 How To Stop Newton s Method 5 3.4 Application: Division

More information

Lecture 6 September 21, 2016

Lecture 6 September 21, 2016 ICS 643: Advanced Parallel Algorithms Fall 2016 Lecture 6 September 21, 2016 Prof. Nodari Sitchinava Scribe: Tiffany Eulalio 1 Overview In the last lecture, we wrote a non-recursive summation program and

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Based

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length

More information

Total least squares. Gérard MEURANT. October, 2008

Total least squares. Gérard MEURANT. October, 2008 Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

Determinants of 2 2 Matrices

Determinants of 2 2 Matrices Determinants In section 4, we discussed inverses of matrices, and in particular asked an important question: How can we tell whether or not a particular square matrix A has an inverse? We will be able

More information

Section 4.4 Reduction to Symmetric Tridiagonal Form

Section 4.4 Reduction to Symmetric Tridiagonal Form Section 4.4 Reduction to Symmetric Tridiagonal Form Key terms Symmetric matrix conditioning Tridiagonal matrix Similarity transformation Orthogonal matrix Orthogonal similarity transformation properties

More information

11.0 Introduction. An N N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if. A x = λx (11.0.1)

11.0 Introduction. An N N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if. A x = λx (11.0.1) Chapter 11. 11.0 Introduction Eigensystems An N N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if A x = λx (11.0.1) Obviously any multiple of an eigenvector x will also be an

More information

NAG Library Routine Document F08JDF (DSTEVR)

NAG Library Routine Document F08JDF (DSTEVR) F08 Least-squares and Eigenvalue Problems (LAPACK) NAG Library Routine Document (DSTEVR) Note: before using this routine, please read the Users Note for your implementation to check the interpretation

More information

Computational complexity and some Graph Theory

Computational complexity and some Graph Theory Graph Theory Lars Hellström January 22, 2014 Contents of todays lecture An important concern when choosing the algorithm to use for something (after basic requirements such as correctness and stability)

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

Singular and Invariant Matrices Under the QR Transformation*

Singular and Invariant Matrices Under the QR Transformation* SINGULAR AND INVARIANT MATRICES 611 (2m - 1)!! = 1-3-5 (2m - 1), m > 0, = 1 m = 0. These expansions agree with those of Thompson [1] except that the factor of a-1' has been inadvertently omitted in his

More information

Computational Complexity

Computational Complexity Computational Complexity S. V. N. Vishwanathan, Pinar Yanardag January 8, 016 1 Computational Complexity: What, Why, and How? Intuitively an algorithm is a well defined computational procedure that takes

More information

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x Technical Report CS-93-08 Department of Computer Systems Faculty of Mathematics and Computer Science University of Amsterdam Stability of Gauss-Huard Elimination for Solving Linear Systems T. J. Dekker

More information

Matrices, Moments and Quadrature, cont d

Matrices, Moments and Quadrature, cont d Jim Lambers CME 335 Spring Quarter 2010-11 Lecture 4 Notes Matrices, Moments and Quadrature, cont d Estimation of the Regularization Parameter Consider the least squares problem of finding x such that

More information

MATRIX MULTIPLICATION AND INVERSION

MATRIX MULTIPLICATION AND INVERSION MATRIX MULTIPLICATION AND INVERSION MATH 196, SECTION 57 (VIPUL NAIK) Corresponding material in the book: Sections 2.3 and 2.4. Executive summary Note: The summary does not include some material from the

More information

11.2 Reduction of a Symmetric Matrix to Tridiagonal Form: Givens and Householder Reductions

11.2 Reduction of a Symmetric Matrix to Tridiagonal Form: Givens and Householder Reductions 11.2 Reduction of a Symmetric Matrix to Tridiagonal Form 469 for (j=i+1;j= p) p=d[k=j]; if (k!= i) { d[k]=d[i]; d[i]=p; for (j=1;j

More information

Recall, we solved the system below in a previous section. Here, we learn another method. x + 4y = 14 5x + 3y = 2

Recall, we solved the system below in a previous section. Here, we learn another method. x + 4y = 14 5x + 3y = 2 We will learn how to use a matrix to solve a system of equations. College algebra Class notes Matrices and Systems of Equations (section 6.) Recall, we solved the system below in a previous section. Here,

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019 CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2019 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis

More information