On the Correctness of Parallel Bisection in Floating Point. Computer Science Division and Department of Mathematics. University of California

Size: px
Start display at page:

Download "On the Correctness of Parallel Bisection in Floating Point. Computer Science Division and Department of Mathematics. University of California"

Transcription

1 On the Correctness of Parallel Bisection in Floating Point James W. Demmel æ Computer Science Division and Department of Mathematics University of California Berkeley, California Inderjit Dhillon y Computer Science Division University of California Berkeley, California Huan Ren z Department of Mathematics University of California Berkeley, California Computer Science Division Technical Report UCBèèCSD University of California, Berkeley, CA March 30, Abstract Bisection is an easily parallelizable method for ænding the eigenvalues of real symmetric tridiagonal matrices, or more generally symmetric acyclic matrices. It requires a function Countèxè which counts the number of eigenvalues less than x. In exact arithmetic Countèxè is an increasing function of x, but this is not necessarily the case with roundoæ. Our ærst result is that as long as the æoating point arithmetic is monotonic, the computed function Countèxè implemented appropriately will also be monotonic; this extends an unpublished 1966 result of Kahan to the larger class of symmetric acyclic matrices. Second, we analyze the impact of nonmonotonicity of Countèxè on the serial and parallel implementations of bisection. We present simple and natural implementations which can fail because of nonmonotonicity; this includes the routine bisect in EISPACK. We also show how to implement bisection correctly despite nonmonotonicity; this is important because the fastest known parallel implementation of Countèxè is nonmonotonic even if the æoating point is not. æ The author was supported by NSF grants ASC and CCR , and DARPA grant DAAL03-91-C-0047 via a subcontract from the University oftennessee. y The author was supported by DARPA grant DAAL03-91-C-0047 via a subcontract from the University of Tennessee. z The author was supported by DARPA grant DAAL03-91-C-0047 via a subcontract from the University of Tennessee. 1

2 1 Introduction Let T by ann-by-n real symmetric tridiagonal matrix with diagonals a 1 ; :::; a n and oæ diagonals b 1 ; :::; b n,1 ;we let b 0 ç 0. Let ç 1 ç æææç ç n be T 's eigenvalues. It is well known ë20ë that the function Countèxè deæned below returns the number of eigenvalues of T that are less than x èfor all but the ænite number of x resulting in a divide by zeroè : Algorithm 1: Countèxè returns the number of eigenvalues of a real symmetric tridiagonal matrix T that are less than x. Count =0; d =1; for i =1ton d = a i, x, b 2 i,1 =d if dé0 then Count = Count+1 endfor èif we wish to emphasize that T is the argument, we will write Countèx; T è instead.è It is easy to see that the number of eigenvalues in the half-open interval ëç 1 ;ç 2 èis Countèç 2 è, Countèç 1 è. This observation may be used as the basis for a ëbisection" algorithm to ænd all the eigenvalues of T, or just those in an interval ëç 1 ;ç 2 èorëç j ;ç k è. Here we interpret bisection very broadly, referring to any algorithm which involves dividing an interval containing at least one eigenvalue into smaller subintervals of any size, and recomputing the numbers of eigenvalues in the subintervals. The algorithm terminates when the intervals are narrow enough. The logic of such a bisection algorithm would seem to depend on the simple fact the Countèxè is a monotonic increasing step function of x. If its computer implementation, call it FloatingCountèxè, were not also monotonic, so that one could ænd ç 1 éç 2 with FloatingCountèç 1 è é FloatingCountèç 2 è, then the computer implementation might well report that the interval ëç 1 ;ç 2 è contains a negative number of eigenvalues, namely FloatingCountèç 2 è,floatingcountèç 1 è. This result is clearly incorrect. In section 4 below, we will see that this can indeed occur using the the Eispack routine bisect èusing IEEE æoating point standard arithmetic ë2, 3ë, and without overèunderæows or other exceptionsè. The goal of this paper is to explore the impact of nonmonotonicity on the bisection algorithm. There are at least three reasons why FloatingCountèxè might not be monotonic: 1. the æoating point arithmetic is too inaccurate, 2. overèunderæow occurs, or is avoided improperly, and 3. FloatingCountèxè is implemented using a fast parallel algorithm called parallel preæx. Our ærst result is to give examples showing monotonicity failures for all three reasons; see sections 4 and 6. Our second result is to show that as long as the æoating point arithmetic is monotonic èwe deæne this in section 2.1è, and FloatingCountèxè is implemented in a reasonable èbut serialè way, then FloatingCountèxè is monotonic. A suæcient condition for æoating point to be monotonic is that it be correctly rounded or correctly chopped; thus IEEE æoating point 2

3 arithmetic is monotonic. This result was ærst proven but not published by Kahan in 1966 for symmetric tridiagonal matrices ë16ë; in this paper we extend this result to symmetric acyclic matrices, a larger class including tridiagonal matrices, arrow matrices, and exponentially many others ë8ë; see section 6. Our third result is to formalize the notion of a correct implementation of bisection, and use this characterization to identify correct and incorrect serial and parallel implementations of bisection. We illustrate with several simple, natural but wrong implementations of parallel bisection, and show how to implement it correctly in the absence of monotonicity; See sections 4 and 7. Nonmonotonic implementations of FloatingCountèxè remain of interest, even though nonmonotonic arithmetics are a dying breed, because the fastest known parallel preæx implementations of FloatingCountèxè appear unavoidably nonmonotonic. We feel this paper is also of interest because it is an example of a rigorous correctness proof of an algorithm using æoating point arithmetic. We make clear exactly which properties of æoating point are necessary to prove correctness. The rest of this paper is organized as follows. Section 2 gives the deænitions and assumptions. Section 3 gives tables to illustrate the results of this paper and the assumptions needed to prove the results. Section 4 gives some examples of incorrect bisection algorithms, and also gives some serial and parallel algorithms that are provably correct subject to some assumptions about the computer arithmetic and FloatingCountèxè. Section 5 reviews the roundoæ error analysis of FloatingCountèxè, and how to account for overèunderæow; this material may also be found in ë16, 8ë. Section 6 illustrates how monotonicity can fail, and proves that a natural serial implementation of FloatingCountèxè must be monotonic if the arithmetic is. Section 7 gives formal proofs for the correctness of the bisection algorithms given in section 4. Section 8 discusses some practical implementation issues and Section 9 concludes the paper. 2 Deænitions and Assumptions Section 2.1 deænes the kinds of matrices whose eigenvalue problems we will be considering, what monotonic arithmetic is, and what ëjump points" of the functions Countèè and FloatingCountèè are. Section 2.2 presents our èmildè assumptions about æoating point arithmetic, the input matrices our algorithms will accept, the way the bisection point ofan interval may bechosen. Section 2.3 list the criteria a bisection algorithm must satisfy to be correct. 2.1 Preliminary Deænitions Algorithm 1 was recently extended to the larger class of symmetric acyclic matrices ë8ë, i.e. those matrices whose graphs are acyclic ètreesè. The undirected graph GèT è of a symmetric n-by-n matrix T is deæned to have n nodes and an edge èi; jè, iéj, if and only if T ij 6=0. A symmetric tridiagonal matrix is one example of a symmetric acyclic matrix; its graph is achain. An ëarrow matrix" which is nonzero only on the diagonal, in the last row and in the last column, is another example; its graph is a star. From now on, we will assume T is a symmetric acyclic matrix unless we state explicitly otherwise. Also we will number the rows and columns of T in preorder such that node 1 is the root of the tree and so accessed 3

4 ærst; node j is called a child of node i if T ij 6= 0 and node j has not yet been visited by the algorithmèsee Algorithm 6 in section 6 for detailsè. We let C denote the maximum number of children of any node in the acyclic graph GèT èèc is never larger than the degree of GèT èè. To describe the monotonicity of FloatingCountèxè, we need to deæne monotonic arithmetic: An implementation of æoating point arithmetic is monotonic if, whenever a, b, c and d are æoating point numbers, æ is any binary operation, and the æoating point results flèa æ bè and flèc æ dè do not overæow, then a æ b ç c æ d implies flèa æ bè ç flèc æ dè. This is satisæed by any arithmetic that rounds or truncates correctly. In Section 6, we will prove that the FloatingCount function èfloating TreeCountè for a symmetric acyclic matrix is monotonic if the æoating point arithmetic is monotonic. We now deæne a jump-point of the function Countèxè. ç i is the i th jump-point of the function Countèxèif lim x!ç, i Countèxè ç ié lim x!ç + i Countèxè Note that if ç i = ç j, then ç i is simultaneously the i th and j th jump point. Analogous to the above deænition, we deæne an i th jump-point of a possibly nonmonotonic function FloatingCountèxè as a æoating point number ç 00 i such that FloatingCountènextbeforeèç 00 i èè ç iéfloatingcountèç 00 i è where nextbeforeèç 00 i è is the largest æoating point number smaller than ç00 i.for a nonmonotonic FloatingCountèxè function, there may be more than one such jump-point. 2.2 Assumptions In order to prove correctness of our algorithms, we need to make some assumptions about the computer arithmetic, the inputs, the bisection algorithm and the function FloatingCountèè. The following is a list of all the assumptions we will make; not all our results require all the assumptions, so we must be explicit about which assumptions we need. The ærst set of assumptions, Assumption 1, concerns the æoating point arithmetic. Not all parts of Assumption 1 are necessary for all later results, so we will later refer to Assumptions 1A, 1B, etc. Assumption 2 is about the input matrix, and includes a mild restriction on its size, and an easily enforceable assumption on its scaling. Assumption 3 is about the method used to chose the ëbisection" point ofaninterval; it is also easily enforced. Assumption 4 consists of two statements about the implementation of the function FloatingCountèè, which can be proven for most current implementations provided appropriate parts of Assumption 1, about the arithmetic, are true. We still call these two statements an assumption, rather than a theorem, because they are the most convenient building blocks for the ultimate correctness proofs. Assumption 1 1A. Assumptions about Floating Point Arithmetic Models Barring overæow, the usual expression for roundoæ is extended to include underæow as follows ë7ë: flèa æ bè =èa æ bèè1 + æè+ç è2.1è 4

5 where æ is a binary arithmetic operation, jæj is bounded by machine precision ", jçj is bounded by a tiny number ç!, typically the underæow threshold! èthe smallest normalized number which can safely participate in, or be a result of, any æoating point operationè 1, and at most one of æ and ç can be nonzero. In IEEE arithmetic, gradual underæow lets us further assert that ç! = "!, and that if æ is addition or subtraction, then ç must be zero. We denote the overæow threshold of the computer èthe largest number which can safely participate in, or be a result of, any æoating point operationè by æ. In this paper, we will consider the following three variations on this basic æoating point arithmetic model: i. Model 1. flèa æ bè=èa æ bèè1 + æè+ç as above, and overæows terminate. ii. Model 2. underæow. IEEE arithmetic with 1 and NaN arithmetic, and with gradual iii. Model 3. IEEE arithmetic with 1 and NaN arithmetic, but with underæow æushing to zero. 1B. p! ç " ç 1 ç 1=" ç p æ. This mild assumption is satisæed by all commercial æoating point arithmetics. 1C. Floating point arithmetic is monotonic. This is true of IEEE arithmetic èmodels 2 and 3è but may not be true of Model 1. 1D. When we talk of parallel implementations, we will assume that all the processors have identical æoating point arithmetic so that the result of the same æoating point operation is bitwise identical on all processors. Assumption 2 2A. Assumption on the problem size n. n" ç :1. 2B. Assumptions on the scaling of the input matrix. Let B ç ç mini6=j T 2 ij and M ç ç max i;j jt ij j. i. ç B ç!. ii. ç M ç p æ. These assumptions may be achieved by explicitly scaling the input matrix èmultiplying it by an appropriate scalarè, and by ignoring small oæ-diagonal elements T 2 ij é!and so splitting the matrix into unreduced blocks ë4ë; see section 5.8 for details. By Weyl's Theorem ë20ë, this may introduce a tiny error of amount no more than p! in the computed eigenvalues. 2C. More assumptions on the scaling of the input matrix. These are used to get reæned error bounds in Section 5. 1 These caveats about ësafe participation in any æoating point operation" take machines like the Cray into account, since they have ëpartial overæow". 5

6 i. M ç ç!=". ii. M ç ç 1=è"æè. Assumption 3 All the algorithms we will consider try to bound an eigenvalue within an interval. A fundamental operation in these algorithms is to compute a point which lies in an interval èæ; æè we denote this point by insideèæ; æè. This point may be computed by simply bisecting the interval èbinary chopè or by applying Newton's or Laguerre's iteration. We assume that flèinsideèæ; æèè 2 èæ; æè, for all attainable values of æ and æ. For example, if we use binary chop, insideèæ; æè = æ+æ 2 and we will assume that flè æ+æ 2 è 2 èæ; æè, for all æéæsuch that æ, æé2maxèæ; æèæ èi.e., there is at least one æoating point number in èæ; æèè, where æ is the machine precision. This assumption always holds in IEEE arithmetic, and for any model of arithmetic which rounds correctly, i.e., rounds a result to the nearest æoating point number. For a detailed treatmentofhow to compute æ+æ 2 correctly on various machines, see ë17ë. An easy way to enforce this assumption given an arbitrary insideèæ; æè is to replace it by minèmaxèinsideèæ; æè; nextafterèæèè; nextbeforeèæèèè; where nextafterèæè is the next æoating point number greater than æ, and nextbeforeèæè is the next æoating point number less than æ. Assumption 4 4A. FloatingCountèxè does not abort. 4B. Let ç è1è00 i ;ç è2è00 i ;:::;ç èkè00 i be the i th jump-points of FloatingCountèxè. We assume that FloatingCountèxè satisæes the error bound, jç èjè00 i, ç i jçç i ; 8j =1;:::;k for some ç i ç 0. We have assumed that FloatingCountèxè has a bounded region of possible nonmonotonicity, and ç i is the width of possible nonmonotonicity around eigenvalue ç i. Diæerent implementations of Countèxè result in diæerent values of ç i èsee Section 5è. For some of the practical FloatingCount functions in use, we will prove Assumption 4 in Section When is a Bisection Algorithm Correct? We now describe the functional behaviour required of any correct implementation of bisection. Let ç i be a user-supplied upper bound on the desired error in the i th computed eigenvalue; this means the user wants the computed eigenvalue ç 0 i to diæer from the true eigenvalue ç i by no more than ç i. Note that not all values of ç i are attainable, and the attainable values of ç i depend on the FloatingCount function, the underlying computer arithmetic and the input matrix T. For example, when using Algorithm 1, ç i can range 6

7 from max i jç i j down to Oè"è max i jç i j, or perhaps smaller for special matrices ë5ë. Let U be a non-empty set of indices that correspond to the eigenvalues that the user wants to compute, e.g., U = f1;:::;ng if the user wants to compute all the eigenvalues of a matrix of dimension n. The output of the algorithm should be a sorted list of the computed eigenvalues, i.e. a list èi; ç 0 i è where each i 2 U occurs exactly once, and ç0 i ç ç0 j, when i ç j. In a parallel implementation, this list may be distributed over the processors in any way at the end of the computation. Thus, the algorithm must compute the eigenvalues and also present the output in sorted order. Beyond neatness, the reason that we require the eigenvalues to be returned in sorted order is that it facilitates subsequent uses that require scanning through the eigenvalues in order, such as reorthogonalization of eigenvectors in inverse iteration. In summary, when we say that an implementation of the bisection algorithm is correct, we assert that it terminates and all of the following hold: æ Every desired eigenvalue is computed exactly once. æ The computed eigenvalues are correct to within the user speciæed error tolerance, i.e. for all desired ié0, jç i, ç 0 i jçç i + ç i èin case ç i éç i, the implementation can easily guarantee that jç i, ç 0 i jçç iè. See section 2.2, Assumption 4B for a deænition of ç i. æ The computed eigenvalues are in sorted order. We say that an implementation of the bisection algorithm is incorrect when any of the above fails to hold. 3 Outline of the Paper In this section, we outline our results in four tables. Table 1 lists all the implementations of FloatingCountèè we consider, and says where they are discussed in the paper. Table 2 lists all the implementations of bisection we consider, and says where they are discussed in the paper. These bisection algorithms all use an implementation of FloatingCountèè internally. Table 3 summarizes the error analyses of the FloatingCountèè implementations in Table 1. It reports which parts of Assumption 1, about arithmetic, and Assumption 2, about the input matrix, are necessary to prove whether FloatingCountèè satisæes Assumption 4, and whether or not FloatingCountèè is monotonic. Detailed numerical error bounds for each implementation of FloatingCountèè are reported in section 5, especially Tables 5 through 9. Table 4 says which assumptions are needed to guarantee the correctness of the overall bisection algorithms in Table 2. Basically, all algorithms require Assumption 3, about choosing the bisection point, Assumption 4, which depends on FloatingCount as summarized in Table 3, and all parallel bisection algorithms require Assumption 1D about parallel processors having identical arithmetic. So, for example, Table 4 says that Algorithms Ser Bisec, Ser AllEig, Par AllEig2 and Par AllEig3 will be correct when used with any of the implementations of Floating- 7

8 Algorithms Description Where bisect algorithm used in the Eispack routine; See section 5.2 most æoating point exceptions avoided by tests and branches and ë21ë IEEE IEEE standard æoating point arithmetic used to accommodate See section 5.3 possible exceptions; tridiagonals only and ë4, 16ë sstebz algorithm used in the Lapack routine See section 5.4 æoating point exceptions avoided by tests and branches and ë1ë Best Scaling like sstebz, but prescales for optimal error bounds See section 5.5 Routine and ë4, 16ë Table 1: Diæerent implementations of FloatingCountèè Algorithms Description Where Ser Bisec Serial bisection algorithm that ænds all the eigenvalues See section 4.4 of T in a user-speciæed interval Ser AllEig Serial bisection algorithm that ænds all the eigenvalues of T See section 4.4 Par AllEig1 Parallel bisection algorithm that ænds all the eigenvalues See section 4.5 of T, load balancing by equally dividing the Gerschgorin interval into p equal subintervals; needs monotonic arithmetic Par AllEig2 Similar to Par AllEig1, but monotonic arithmetic unneeded See section 4.5 Par AllEig3 Parallel bisection algorithm that ænds all the eigenvalues See section 4.5 of T, load balancing by making each processor ænd an equal number of eigenvalues; monotonic arithmetic unneeded Table 2: Diæerent implementations of Bisection Assumptions about Results Proofs Arithmetic and Input Matrix T is symmetric tridiagonal ^ For bisect's FloatingCountèxè, See section 4.2 è1aèiiè _ 1Aèiiièè ^ 1B ^ Assumption 4 holds but and section 5.2 2A ^ 2Bèiiè FloatingCountèxè can be nonmonotonic T is symmetric tridiagonal ^ For IEEE routine's FloatingCountèxè, See section 5.3 è1aèiiè _ 1Aèiiièè ^ 1B ^ Assumption 4 holds and and section 6 2A ^ 2B FloatingCountèxè is monotonic T is symmetric acyclic ^ For sstebz's FloatingCountèxè, See section 5.4 1A ^ 1B ^ 1C ^ Assumption 4 holds and and section 6 2A ^ 2Bèiiè FloatingCountèxè is monotonic T is symmetric acyclic ^ For Best Scaling's FloatingCountèxè, See section 5.5 1A ^ 1B ^ 1C ^ Assumption 4 holds and and section 6 2A FloatingCountèxè is monotonic Table 3: Results of Roundoæ Error Analysis and Monotonicity 8

9 Assumptions Results Proofs 3, 4 Algorithm Ser Bisec is correct See section 7 3, 4 Algorithm Ser AllEig is correct See section 4 1D, 3, 4, and Algorithm Par AllEig1 is correct See section 7.2 FloatingCountèxè is monotonic 1D, 3, 4 Algorithm Par AllEig2 is correct See section 7.2 1D, 3, 4 Algorithm Par AllEig3 is correct See section 7.2 Table 4: Correctness Results Count in Table 1, provided the corresponding conditions in Table 3 are met. On the other hand, Algorithm Par AllEig1 cannot be used correctly with bisect's FloatingCount, since that FloatingCount is not monotonic. 4 Correctness of Bisection Algorithms We now investigate the correctness of bisection algorithms and exhibit some existing ënatural" implementations of bisection that are incorrect. Table 4 summarizes all the correctness results that we prove in this paper. 4.1 An Incorrect Serial Implementation of Bisection We give an example of the failure of Eispack's bisect routine in the face of a nonmonotonic FloatingCountèxè. Suppose we use IEEE standard double precision æoating point arithmetic with " =2,52 ç 2:2æ10,16 and we want to ænd the eigenvalues of the following 2æ2 matrix: è! 0 " A = " 1 In exact arithmetic, A has eigenvalues near 1 and," 2 ç,4:93 æ 10,32. But bisect reports that the interval ë,10,32 ; 0è contains,1 eigenvalues. The reason for this is bisect's incorrect provision against division by zero èsee Algorithm 3 in Section 5è. In Section 6, by the proof of Theorem 6.1, we will show that this cannot happen for the Lapack routine dstebzèsstebzè even for general symmetric acyclic matrices. 4.2 Nonmonotonicity of Parallel Preæx Algorithm We now give another example of a nonmonotonic FloatingCountèxè when Countèxè is implemented using a fast parallel algorithm called parallel preæx ë9ë. Figure 1 shows the FloatingCountèxè ofa64æ64 matrix of norm near 1 with 32 eigenvalues very close to 5 æ 10,8 computed both by the conventional bisection algorithm and the parallel preæx algorithm in the neighborhood of the eigenvalues; see ë19, 12ë for details. 4.3 A Correct Serial Implementation of the Bisection Algorithm As we saw in Section 4.1, the Eispack implementation of the bisection algorithm fails in the face of nonmonotonicity of the function FloatingCountèxè. We now present an imple- 9

10 35 Number of eigenvalues less than x Parallel Preix --- Dotted Conventional Bisect --- Solid x 10-7 Figure 1: Parallel Preæx vs Conventional Bisection mentation which works correctly irrespective of whether FloatingCountèxè is monotonic or not. All the intervals referred to in the following discussion will be half-open intervals of the form ëæ; æè. We deæne a task to be a 5-tuple T =èæ; æ; n æ ;n æ ;Oè, where ëæ; æè isa non-empty interval, n æ and n æ are the counts associated with æ and æ respectively, and O is the set of indices corresponding to the eigenvalues being searched for in this interval. We obviously require O ç I n æ næ, where I n æ næ = fn æ +1;:::;n æ g èi n æ næ = ; when n æ ç n æ è. We do not insist that n æ = FloatingCountèæè, and n æ = FloatingCountèæè, only that n æ ç FloatingCountèæè and n æ ç FloatingCountèæè. In most implementations O = I n æ næ and the index set O is not explicitly maintained by the implementation. Algorithm Ser Bisec èsee Figure 2è is a correct serial implementation of bisection, that ænds all the eigenvalues speciæed by the given initial task èleft; right; n lef t ;n right ;I n right n left è, the i th eigenvalue being found to the desired accuracy ç i ènote that we allow diæerent tolerances to be speciæed for diæerent eigenvalues, ç i being the i th component of the input tolerance vector ç è. The diæerence between this implementation and the Eispack implementation is the initial check to see if n lef t ç n right, and the forcing of monotonicity on FloatingCountèxè by executing statement 10ateach iteration. Statement 10 has no effect if FloatingCountèxè is monotonic, whereas it forces n mid to lie between n æ and n æ if FloatingCountèxè is nonmonotonic. Theorem 4.1 Algorithm Ser Bisec is correct if Assumptions 3 and 4 hold. Proof. The theorem is proved in Section 7. tu 10

11 subroutine Ser Bisecèn,T,left,right,n lef t,n right,çè è* computes the eigenvalues of T in the interval ëleft; rightë to the desired accuracy ç *è 1: if èn lef t ç n right or left é rightè return; 2: if èfloatingcountèleftè én lef t or FloatingCountèrightè én right è return; 3: enqueue èleft; right; n lef t ;n right ;I n right n left ètoworklist; 4: while èworklist is not emptyè 5: dequeue èæ; æ; n æ ;n æ ;I n æ næ è from Worklist; 6: mid = insideèæ; æè; 7: if èæ, æémin n æ i=næ+1 ç iè then 8: print ëeigenvalue minèmaxèèæ + æè=2;æè;æè has multiplicity n æ, n æ "; 9: else 10: n mid = minèmaxèfloatingcountèmidè;n æ è;n æ è; 11: if èn mid én æ è then 12: enqueue èæ; mid; n æ ;n mid ;I n mid næ ètoworklist; 13: end if 14: if èn mid én æ è then 15: enqueue èmid; æ; n mid ;n æ ;I n æ n mid ètoworklist; end if end while end subroutine end if Figure 2: Algorithm Ser Bisec 11

12 subroutine Ser Alleigèn,T,ç è è* computes all the eigenvalues of T *è ègl; guè =Compute Gerschgorinèn,T è; call Ser Bisecèn,T,gl,gu,0,n,ç è; end subroutine function Compute Gerschgorinèn,T è è* returns the Gerschgorin Interval ègl; guè*è 1: gl = min P n i=1 èt ii, j6=i jt ijjè; è* Gerschgorin left bound *è 2: gu = max n i=1 èt ii + P j6=i jt ij jè; è* Gerschgorin right bound *è 3: bnorm = maxèjglj; jgujè; 4: gl = gl, bnorm æ 2n", ç 0 ; gu = gu + bnorm æ 2n" + ç n ; è* see Table 9 *è 5: returnègl,guè; end function Figure 3: Algorithm Ser Alleig computes all the eigenvalues of T Note that in all our theorems about the correctness of various bisection algorithms, unless stated otherwise, we do not require FloatingCountèxè to be monotonic. Algorithm Ser Alleig èsee ægure 3è is designed to compute all the eigenvalues of T to the desired accuracy. Compute Gerschgorin is a subroutine that computes the Gerschgorin interval of the symmetric acyclic matrix T. Note that in line 4 of the pseudocode, the Gerschgorin interval is widened to ensure that FloatingCountèglè = 0 and FloatingCountèguè = n and hence is guaranteed to contain all the eigenvalues èthis is proved in Section 4è. Due to the correctness of Algorithm Ser Bisec, we have the following corollary : Corollary 4.1 Algorithm Ser Alleig is correct if Assumptions 3 and 4 hold. 4.4 Parallel Implementation of the Bisection Algorithm Wenowdiscuss parallel implementations of the bisection algorithm. The bisection algorithm oæers ample opportunities for parallelism, and many parallel implementations exist ë6, 13, 18, 14ë. We ærst discuss a natural parallel implementation which gives the false appearance of being correct. Then, we give acorrect parallel implementation that has been tested extensively on the Connection Machine CM-5 supercomputer A Simple, Natural and Incorrect Parallel Implementation A natural way to divide the work arising in the bisection algorithm among p processors is to partition the initial Gerschgorin interval into p equal subintervals, and assign to processor i the task of ænding all the eigenvalues in the i th subinterval. Algorithm Par Alleig0 èsee ægure 4è is a simple and natural parallel implementation based on this idea that attempts to ænd all the eigenvalues of T èthe p processors are assumed to be numbered 0; 1; 2;:::; p, 1è. However, algorithm Par Alleig0 fails to ænd the eigenvalue of a 1æ1 identity matrix when 12

13 subroutine Par Alleig0èn,T,ç è è* computes all the eigenvalues of T in parallel *è i = My Processor Numberèè; ègl,guè =Compute Gerschgorinèn,T è; average width =ègr, glè=p; æèiè =gl + i æ average width; æèiè =æèiè+average width; n æèiè = FloatingCountèæèièè; n æèiè = FloatingCountèæèièè; call Ser Bisecèn,T,æèiè,æèiè,n æèiè,n æèiè,çè; end subroutine Figure 4: Algorithm Par Alleig0 executed by processor i An Incorrect Parallel Implementation implemented on a 32 processor CM-5! The reason is that when p = 32, average width is so small that æèiè =æèiè for i =0;:::;p, 1. Thus, none of the processors are able to ænd the only eigenvalue. èthis would happen on any machine with IEEE arithmetic, not just the CM-5.è The error in algorithm Par Alleig0 is in the way æèiè is computed. Algorithm Par Alleig1 èsee ægure 5è æxes the problem by computing æèiè as gl + èi + 1è æ average width. This results in the following theorem, which we prove in Section 7. Theorem 4.2 Algorithm Par Alleig1 is correct if Assumptions 1D, 3 and 4 hold and FloatingCountèxè is monotonic. However, when FloatingCountèxè is nonmonotonic, Algorithm Par Alleig1 is still incorrect! The error in the algorithm is that when FloatingCountèxè is nonmonotonic, a desired eigenvalue may be computed more than once. For example, suppose n = p =3, n æè0è =0,n æè0è = n æè1è =2,n æè1è = n æè2è =1,andn æè2è = 3. In this case, the second eigenvalue will be computed both by processors 0 and 2. Algorithm Par Alleig2 èsee ægure 6è corrects this problem by making processor 0 ænd all the initial tasks that are input to algorithm Ser Alleig. These initial tasks are formed such that n æèiè ç n æèiè, for i =0;:::;p, 1, and are then communicated to the other processors. Note that these initial tasks may also be formed by computing n æèiè and n æèiè in parallel on each processor, making sure n æèiè ç n æèiè, and then doing two max scans replacing n æèiè by max jçi n æèjè and n æèiè by max jçi n æèjè. This would take logèpè steps on p processors. The function sendèi,n æ è sends the number n æ to processor i, while receiveè0,n æèiè è results in n æèiè being set to the number sent by processor 0. Thus, we have the following theorem: Theorem 4.3 Algorithm Par Alleig2 is correct if Assumptions 1D, 3 and 4 hold. Proof. The theorem is proved in Section 7. tu 13

14 subroutine Par Alleig1èn,T,ç è è* computes all the eigenvalues of T in parallel *è i = My Processor Numberèè; ègl,guè =Compute Gerschgorinèn,T è; average width =ègr, glè=p; æèiè =gl + i æ average width; æèiè = maxègl + èi + 1è æ average width; æèièè; if èi = p, 1è æèiè =gu; n æèiè = FloatingCountèæèièè; n æèiè = FloatingCountèæèièè; call Ser Bisecèn,T,æèiè,æèiè,n æèiè,n æèiè,çè; end subroutine Figure 5: Algorithm Par Alleig1 executed by processor i Correct when FloatingCountèxè is monotonic A Practical Correct Parallel Implementation Although correct, Algorithm Par Alleig2 is very sensitive to the eigenvalue distribution in the Gerschgorin interval, and does not result in high speedups on massively parallel machines when the eigenvalues are not distributed uniformly. Wenow give a more practical parallel implementation, and prove its correctness. Algorithm Par Alleig3 èsee ægure 7è partitions the work among the p processors by making each processor ænd an almost equal number of eigenvalues èfor ease of presentation, we assume that p divides nè. This static partitioning of work has been observed to give good performance on parallel machines like the CM-5 ë13ë, for almost all eigenvalue distributions. In algorithm Par Alleig3, processor i attempts to ænd eigenvalues ièn=pè + 1 through èi +1èn=p. The function Find Init Task invoked on processor i ænds a æoating point number æèiè such that FloatingCountèæèièè = èi +1èn=p, unless ç èi+1èn=p is part of a cluster of eigenvalues. In the latter case, Find Init Task ænds a æoating point number æèiè such that FloatingCountèæèièè is bigger than èi +1èn=p and ç èi+1èn=p ;:::;ç FloatingCountèæèièè form a cluster relative to the user speciæed error tolerance, ç. If the cluster is so big that FloatingCountèæèièè is larger than èi +2èn=p, processor i sets n æèiè to èi +1èn=p in order to ensure that each desired eigenvalue is computed just once. Each processor i computes æèiè, and communicates with processor i, 1 to receive æèiè, and then calls algorithm Ser Bisec with the initial task èæèiè, æèiè,n æèiè,n æèiè,i æèiè æèièè returned by the function Find Init Task. When the eigenvalues are well separated, each processor ænds an equal number of eigenvalues. Theorem 4.4 Algorithm Par Alleig3 is correct if Assumptions 1D, 3 and 4 hold. Proof. The theorem is proved in Section 7. tu Note that we used some communication between processors to guarantee correctness of 14

15 subroutine Par Alleig2èn,T,ç è è* computes all the eigenvalues of T in parallel *è i = My Processor Numberèè; ègl,guè =Compute Gerschgorinèn,T è; average width =ègr, glè=p; æèiè =gl + i æ average width; æèiè =gl +èi +1èæ average width; if èi = p, 1è æèiè =gu; if èi =0èthen n æ =0; else end if for èi =1;iép; i = i +1è do è* does a max scan *è æ = gl + i æ average width; n æ = maxèfloatingcountèæè;n æ è; sendèi,n æ è; end for receiveè0,n æèiè è; if èi 6= 0èthen sendèi, 1,n æèiè è; end if if èi 6= p, 1è then receiveèi +1,n æèiè è; end if call Ser Bisecèn,T,æèiè,æèiè,n æèiè,n æèiè,çè; end subroutine Figure 6: Algorithm Par Alleig2 executed by processor i A Correct Parallel Implementation 15

16 subroutine Par Alleig3èn,T,ç è è* computes all the eigenvalues of T in parallel *è i = My Processor Numberèè; ègl,guè =Compute Gerschgorinèn,T è; èæèiè,æèiè,n æèiè,n æèiè è=find Init Taskèn,T,gl,gu,0,n,çè; call Ser Bisecèn,T,æèiè,æèiè,n æèiè,n æèiè,çè; end subroutine function Find Init Taskèn,T,æèiè,æèiè,n æèiè,n æèiè,çè è* returns initial task *è save = æèiè; n save = n æèiè ; while èèn æèiè 6=èi +1èn=pè and èæèiè, æèiè é min n æèiè i=n æèiè +1 ç ièè mid = insideèæèiè;æèièè; n mid = minèmaxèfloatingcountèmidè;n æèiè è;n æèiè è; if èn mid ç èi +1èn=pè then æèiè =mid; n æèiè = n mid ; else end if end while æèiè=æèiè; æèiè =mid; n æèiè = n mid ; if èn æèiè é èi +2èn=pè then n æèiè =èi +1èn=p; æèiè=æèiè; end if if èi 6= p, 1è then sendèi +1,æèièè; sendèi +1,n æèiè è; end if if èi 6= 0èthen receiveèi, 1,æèièè; receiveèi, 1,n æèiè è; else æèiè =save; n æèiè = n save ; end if returnèæèiè,æèiè,n æèiè,n æèiè è; end function Figure 7: Algorithm Par Alleig3 executed by processor i when each processor ænds an èalmostè equal number of eigenvalues 16

17 the parallel algorithms. It is much harder èand less elegantè to construct and prove correctness of similar correct and eæcient parallel algorithms that do not use any communication. 5 Roundoæ Error Analysis As we mentioned before, Algorithm 1 was recently extended to the symmetric acyclic matrices. In ë8ë the following implementation of Countèxè for acyclic matrices was given. The algorithm refers to the tree GèT è, where node 1 is chosen èarbitrarilyè as the root of the tree, and node j is called a child of node i if T ij 6= 0 and node j has not yet been visited by the algorithm. Algorithm 2: Countèxè returns the number of eigenvalues of the symmetric acyclic matrix T that are less than x. call TreeCountè1;x;d;sè Count =s procedure TreeCountèi; x; d; sè è* i and x are inputs, d and s are outputs *è s =0 sum =0 for all children j of i do call TreeCountèj; x; d 0 ;s 0 è sum = sum + T 2 ij =d0 s = s + s 0 endfor d =èt ii, xè, sum if dé0 then s = s +1 end TreeCount In ë8ë it is also shown that barring overèunderæow, the æoating point version of Algorithm 2 has the same attractive backward error analysis as the æoating point version of Algorithm 1: Let FloatingCountèx; T è denote the value of Countèx; T è computed in æoating point arithmetic. Then FloatingCountèx; T è = Countèx; T 0 è, where T 0 diæers from T only slightly: jt ij, T 0 ijj çfèc=2+2;"èjt ij j if i 6= j and T ii = T 0 ii; è5.2è where " is the machine precision, C is the maximum number of children of any nodeinthe graph GèT è and fèn; "è is deæned by By Assumption 2A èn" ç :1è, we have ë22ë: fèn; "è = è1 + "è n, 1: fèn; "è ç 1:06n": èstrictly speaking, the proof of this bound is a slight modiæcation of the one in ë8ë, and requires that d be computed exactly as shown in TreeCount. The analysis in ë8ë makes no 17

18 assumption about the order in which the sum for d is evaluated, whereas the bound è5.2è for TreeCount assumes the parentheses in the sum for d are respected. Not respecting the parentheses weakens the bounds just slightly, and complicates the discussion below, but does not change the overall conclusion.è This tiny componentwise backward error permits us to compute the eigenvalues quite accurately, aswenow discuss. Suppose the backward error in è5.2è can change eigenvalue ç k by at most ç k. For example, Weyl's Theorem ë20ë implies that ç k ç kt, T 0 k 2 ç 2fèC=2+2;"èkT k 2, i.e. that each eigenvalue is changed by an amount small compared to the largest eigenvalue. If T ii = 0 for all i, then ç k ç è1,èc +4è"è 1,n jç k j, i.e. each eigenvalue is changed by an amount small relative to itself. See ë16, 5, 10ë for more such bounds. Now suppose that at some point in the algorithm we have aninterval ëx; yè, xéy, where i = FloatingCountèx; T è é FloatingCountèy; T è = j : è5.3è Let T 0 x be the equivalent matrix for which FloatingCountèx; T è = Countèx; T 0 xè, and T 0 y be the equivalent matrix for which FloatingCountèy; Tè = Countèy; T 0 yè, Thus x ç ç i+1 èt 0 xè ç ç i+1 èt è+ç i+1,orx,ç i+1 ç ç i+1 èt è. Similarly, yéç j èt 0 yè ç ç j èt è,ç j,orç j èt è éy+ç j. Altogether, x, ç i+1 ç ç i+1 èt è ç ç j èt è éy+ ç j : è5.4è If j = i +1,we get the simpler result x, ç j ç ç j èt è éy+ ç j : è5.5è This means that by making x and y closer together, we can compute ç j èt è with an accuracy of at best about æç j ; this is when x and y are adjacent æoating point numbers and j = i +1 in è5.3è. Thus, in principle ç j èt è can be computed nearly as accurately as the inherent uncertainty ç j permits. Accounting for overèunderæow is done as follows. We ærst discuss the way it is done in Eispack's bisect routine ë21ë, then the superior method in Lapack's sstebz routine ë1, 16ë. The diæculty arises because if d 0 is tiny or zero, the division T 2 ij =d0 can overæow. In addition, T 2 ij can itself overèunderæow. We would like to account for this by modifying the algorithm to avoid overèunderæow ènot necessary if we have IEEE arithmeticè, and slightly increasing the backward error bound è5.2è. We denote the pivot d computed when visiting node i by d i. The æoating point operations performed while visiting node i are then d i = flèèt ii, xè, è X all children j of i T 2 ij d j èè: è5.6è To analyze this formula, we will let subscripted "'s and ç's denote independent quantities bounded in absolute value by " and ç!. We will also make standard substitutions like Q ni=1 è1 + " i è=è1+~"è n where j~"j ç", and è1 + " i è æ1 ç j = ç j. 18

19 5.1 Model 1: Barring Overæow, Acyclic Matrix Barring overæow, è5.6è and Assumption 2Bèiè leads to d i = fèt ii, xèè1 + " i aè+ç 1i, X all children j of i T 2 ij d j è1 + ~" ij è C+1, è2c, 1èç 2i gè1 + " i bè+ç 3i : or d i 1+" i b =èt ii, xèè1 + " i aè, X all children j of i T 2 ij d j è1 + ~" ij è C+1 +2C æ ç 0 2i + ç 0 3i: or d i è1 + " i cè 2 = T ii, x, X all children j of i T 2 ij d j è1 + ^" ij è C+2 +è2c +1èç i : Let ~ d i = d i =è1 + " i cè 2, ænally, ~d i = T ii +è2c +1èç i, x, X all children j of i T 2 ij ~d j è1 + " ij è C+4 : è5.7è Remark 5.1 Under Model 2, IEEE arithmetic with gradual underæow, the underæow error è2c +1èç i of the above equation can be replaced by Cç i because addition and subtraction never underæow. If there is no underæow during the computations of d i either, then è5.7è simpliæes to: ~d i = T ii, x, X all children j of i T 2 ij ~d j è1 + " ij è C+4 : This proves è5.2è, since the ~ d i are the exact pivots correspond to T 0 where T 0 satisæes è5.2è and signè ~ d i è = signèd i è. Remark 5.2 We need to bar overæow in principle for symmetric acyclic matrix with IEEE arithmetic, because if in è5.6è, there are two children j 1 and j 2 of i such that T 2 ij 1 =d j1 overæows to 1 and T 2 ij 2 =d j2 overæows to,1; then d i will be NaN, not even well-deæned. 19

20 5.2 Models 2 and 3: Eispack's bisect routine, Tridiagonal Matrix Eispack's bisect can overæow for symmetric tridiagonal or acyclic matrices with Model 1 arithmetic, and return NaN's for symmetric acyclic matrices and IEEE arithmetic since it makes no provision against overæow èsee Remark 5.2è. In this subsection, we assume T is a symmetric tridiagonal matrix whose graph is just a chain, i.e. C = 1. Therefore, to describe the error analysis for bisect, we need the following assumptions: Assumption 1Aèiiè: Model 2. Full IEEE arithmetic with 1 and NaN arithmetic, and with gradual underæow. Assumption 1Aèiiiè: Model 3. Full IEEE arithmetic with 1 and NaN arithmetic, but with underæow æushing to zero. Assumption 2Bèiiè: ç M ç maxi;j jt ij jç p æ. Algorithm 3: Eispack bisect. Countèxè returns the number of eigenvalues of a real symmetric tridiagonal matrix T that are less than x. Count =0; d 0 =1; for i =1ton if èd i = 0è then v = jb i,1 j=" else v = b 2 i,1 =d i endif d i = a i, x, v if d i é 0 then Count = Count+1 endfor Under Models 2 and 3, our error expression è5.7è simpliæes to ~d i = a i +3ç i, x, b2 i,1 è1 + " ijè 5 ~d i,1 : where a i = T ii and b i,1 = T i,1;i. However, bisect's provision against division by zero can drastically increase the backward error bound è5.2è. When d j = 0 for some j in è5.6è, it is easy to see that what bisect does is equivalent to perturbing a j by "jb j j. This backward error is clearly small in norm, i.e. at most "kt k 2, and so by Weyl's Theorem, can perturb computed eigenvalue by no more than "kt k 2. If one is satisæed with absolute accuracy, this is suæcient. However, it can clearly destroy any componentwise relative error, because "jb j j maybe much larger than ja j j. Furthermore, suppose there is some k such that d k overæows, i.e. jd k jçæ. Since çm ç p æ,itmust be b 2 k,1 =d k,1 that overæows. So ~ d k is,signèb 2 k,1 =d k,1èæ1 which has the same sign as the exact pivot corresponds to T 0. But this will contribute an extra uncertainty to a k+1 of at most ç M 2 =æ, since jb 2 k =d kjç ç M 2 =æ. 20

21 Therefore we get the following backward error for bisect: jt ij, T 0 ij jçfè2:5;"èjt ijj if i 6= j: and jt ii, T 0 iij ç"kt k 2 + ç M 2 æ + è "! Model 2 3! Model 3 : 5.3 Models 2 and 3: IEEE routine, Tridiagonal Matrix The following code can work only for symmetric tridiagonal matrices under Models 2 and 3 for the same reason as bisect: otherwise we could get T 2 ij 1 =d j1 +T 2 ij 2 =d j2 = 1,1 = NaN. So in this subsection, we again assume T is a symmetric tridiagonal matrix. By using IEEE arithmetic, we can eliminate all tests in the inner loop, and so make it faster on many architectures ë11ë. To describe the error analysis, we again make Assumptions 1Aèiiè, 1Aèiiiè and 2Bèiiè as in Section 5.2, and Assumption 2Bèiè, which is ç B ç mini6=j T 2 ij ç!. Algorithm 4: IEEE routine. Countèxè returns the number of eigenvalues of a real symmetric tridiagonal matrix T that are less than x. Count =0; d 0 =1; for i =1ton è* note that there is no provision against overæow and division by zero *è d i =èa i, xè, b 2 i,1 =d i,1 Count = Count + SignBit èd i è endfor By Assumption 2Bèiè, b 2 i never underæows. Therefore when some d i underæows, we do not have the headache of dealing with 0=0 which is NaN. Algorithm 4 is quite similar to bisect except there is no provision against division by zero, and the SignBitèæ0è function è= 0 or 1è is used to count eigenvalues. More precisely, if d i =0,d i+1 would be,1, so after two steps, Count will increase by 1. On the other hand, if d i =,0, d i+1 would be 1, therefore Count also increases by 1 after two steps, which is correct. Using an analysis analogous to the last section, if we use Model 2 ègradual underæowè, T 0 diæers from T by jt ij, T 0 ij jçfè2:5;"èjt ijj if i 6= j and jt ii, T 0 ii jç ç M 2 Using Model 3 èæush to zeroè, we have the slightly weaker results that æ + "!: jt ij, T 0 ijj çfè2:5;"èjt ij j if i 6= j and jt ii, T 0 iij ç ç M 2 æ +3!: Since ç M ç p æ, so ç M 2 =æ çm ç 1 p æ ç ": which tells us that ç M ç p æ is a good scaling choice. 21

22 5.4 Models 1, 2 and 3: Lapack's sstebz routine, Acyclic Matrix In contrast to Eispack's bisect and IEEE routines, Lapack's sstebz can work in principle for general symmetric acyclic matrices under all three models èalthough its current implementation only works for tridiagonal matricesè. So in this subsection, T is a symmetric acyclic matrix. Let B = max i6=j è1;t 2 ijè ç æ, and ^p =2CæB=æ è^p is called pivmin in sstebzè. In this subsection, we need the Assumptions 1A èmodel i, ii or iiiè and 2Bèiiè. Because of the Gerschgorin Disk Theorem, we can restrict our attention to those shifts x such that jxj çèn +1è p æ. Algorithm 5: Countèxè returns the number of eigenvalues of the symmetric acyclic matrix T that are less than x. call TreeCountè1;x;d 1 ;s 1 è Count =s 1 procedure TreeCountèi; x; d i ;s i è è* i and x are inputs, d i and s i are outputs *è d i = flèt ii, xè s i =0 for all children j of i do call TreeCountèj; x; d j ;s j è sum = sum + T 2 ij =d j s i = s i + s j endfor d i =èt ii, xè, sum if èjd i jç^pè d i =,^p if d i é 0 then s i = s i +1 end TreeCount It is clear that jd i jç^p for each nodei, so jt ii j + jxj + X all children j of i j T 2 ij jçèn +2è p æ+c æ B^p d ç æ j 2 + C B 2C æ B=æ =æ: This tells us that sstebz never overæows and it works under all three models. For all these models, the assignment d i =,^p when jd i j is small can contribute an extra uncertainty to T ii of no more than 2 æ ^p. Thus we have the following backward error: jt ij, T 0 ij jçfèc=2+2;"èjt ijj if i 6= j: and jt ii, T 0 iij ç2 æ ^p + 8 é é: è2c +1èç! Model 1 C"! Model 2 è2c +1è! Model 3 : 22

23 5.5 Models 1,2 and 3: Best Prescaling Algorithm, Acyclic Matrix Following Kahanë16ë, let ç =! 1=4 æ,1=2 and M = ç æ æ=! 1=4 æ 1=2. The following code assumes that the initial data has been scaled so that çm ç M p 2C and ç M ç M p2c : This code can be used to compute the eigenvalues of general symmetric acyclic matrix, so in this subsection, T is a symmetric acyclic matrix. To describe the error analysis, we only need Assumption 1A èi, ii or iiiè. Again because of the Gerschgorin Disk Theorem, the shifts are restricted to those x such that jxj çèn + 1èM. The Best Scaling Algorithm is almost the same as the sstebz except ^p =, p!,sowe do not repeat the code here. Similar to sstebz, jd i jç p! for any nodei, so jt ii j+jxj+ X all children j of i j T 2 ij d j jçèn+1èm + M p 2C +C æ M 2 =2C! 1=2 ç æ=2+c æ!1=2 æ=2c! 1=2 =æ which tells us overæow never happens and the code can work æne under all the models we mentioned. For all the models, The backward error bound becomes, jt ij, T 0 ij jçfèc=2+2;"èjt ijj if i 6= j: and jt ii, T 0 iij ç2 p! + 8 é é: è2c +1èç! Model 1 C"! Model 2 è2c +1è! Model 3 : 5.6 Error Bounds For Eigenvalues We need the following lemma to give error bounds for the computed eigenvalues. Lemma 5.1 Assume T is an acyclic matrix and FloatingCountèx; T è = Countèx; T 0 è, where T 0 diæers from T only slightly: jt ij, T 0 ijj çæè"èjt ij j if i 6= j and jt ii, T 0 iij çæ: where æè"è ç 0 is a function of " and æ ç 0. Then this backward error can change the eigenvalues ç k by at most ç k where ç k ç 2æè"è k T k 2 +æ: è5.8è Proof. By Weyl's Theorem ë20ë, ç k çkt, T 0 k 2 çkjt, T 0 jk 2 çkæè"èjt, æj + æik 2 ç æè"èkjt, æjk 2 + æ: and kjt, æjk 2 = kt, æk 2 çkt k 2 + kæk 2 ç 2kT k 2 : 23

24 Algorithms Model 1 Model 2 Model 3 æè"è æ æè"è æ æè"è æ bisect fè2:5;"è "kt k 2 + M ç 2 æ +"! fè2:5;"è "kt k M 2+ ç 2 æ +3! sstebz fè2:5;"è 2^p+3ç! fè2:5;"è 2^p+"! fè2:5;"è 2^p+3! best scaling fè2:5;"è 2 p!+3ç! fè2:5;"è 2 p!+"! fè2:5;"è 2 p!+3! IEEE fè2:5;"è çm 2 æ +"! fè2:5;"è çm 2 æ +3! Table 5: Backward Error Bounds for Symmetric Tridiagonal Matrices Algorithms Model 1 Model 2 Model 3 bisect ë2fè2:5;"è+"ëkt k 2 + M ç 2 sstebz best scaling 2fè2:5;"èkT k 2 +2^p+3ç! 2fè2:5;"èkT k 2 +2 p w+3ç! 2fè2:5;"èkT k 2 +2^p+"! 2fè2:5;"èkT k 2 +2 p w+"! 2fè2:5;"èkT k 2 +2^p+3! 2fè2:5;"èkT k 2 +2 p w+3! æ +"! ë2fè2:5;"è+"ëkt k 2+ ç M 2 æ +3! IEEE 2fè2:5;"èkT k 2 + ç M 2 æ +"! 2fè2:5;"èkT k 2+ ç M 2 æ +3! Table 6: Error Bounds ç k of Eigenvalues for Symmetric Tridiagonal Matrices where æ = diagèd i è which is the diagonal part of T. Therefore, ç k ç 2æè"èkT k 2 + æ: tu According to this lemma, we present the tables of backward errors æè"è and æ, and the corresponding error bounds ç k of the eigenvalues, for diæerent algorithms under diæerent models ètables 5 through 8è 5.7 Correctness of the Gerschgorin Bound In this subsection, we will prove the correctness of the Gerschgorin bound computed by the routine Compute Gerschgorin èsee Figure 3è. Since X gl = minèt ii, i j6=i So, bnorm = maxèjglj; jgujè=kt k 1. Notice that flèèt ii, X j6=i X jt ij jè; gu = maxèt ii + jt ij jè: i j6=i jt ij jèè = èt ii è1 + æ i è k i, X j6=i jt ij jè1 + æ j è k j è: Therefore, jflèglè, gljçfèc; "èkt k 1 ç 2n"kT k 1 =2n" æ bnorm: Algorithms Model 1 Model 2 Model 3 æè"è æ æè"è æ æè"è æ bisect sstebz fèc=2+2;"è 2^p+è2C+1èç! fèc=2+2;"è 2^p+C"! fèc=2+2;"è 2^p+è2C+1è! best scaling fèc=2+2;"è 2 p!+è2c+1èç! fèc=2+2;"è 2 p!+c"! fèc=2+2;"è 2 p!+è2c+1è! IEEE Table 7: Backward Error Bounds for Symmetric Acyclic Matrices 24

25 Algorithms Model 1 Model 2 Model 3 bisect sstebz best scaling 2fèC=2+2;"èkT k 2 +2^p+è2C+1èç! 2fèC=2+2;"èkT k 2 +2 p w+è2c+1èç! 2fèC=2+2;"èkT k 2 +2^p+C"! 2fèC=2+2;"èkT k 2 +2 p w+c"! 2fèC=2+2;"èkT k 2 +2^p+è2C+1è! 2fèC=2+2;"èkT k 2 +2 p w+è2c+1è! IEEE Table 8: Error Bounds ç k of Eigenvalues for Symmetric Acyclic Matrices Algorithms Matrix Additional Assumptions Bounds of ç k bisect Tridiagonal Assumption 2Cèiè 11" æ bnorm sstebz Acyclic Assumptions 2Cèiè, 2Cèiiè è8n + 6è" æ bnorm best scaling Acyclic è4n + 8è" æ bnorm IEEE Tridiagonal Assumption 2Cèiè 10" æ bnorm Table 9: Upper Bounds for ç k for Diæerent Algorithms under Diæerent Models Similarly, jflèguè, guj ç2n" æ bnorm. This proves that if we let then we can claim gl = gl, 2n" æ bnorm, ç 0 ; gu = gu +2n" æ bnorm + ç n : FloatingCountèglè = 0; FloatingCountèguè = n: For the algorithms we mentioned in the previous subsections, we can obtain the upper bounds for ç k under certain additional appropriate assumptions, which enable us to give more speciæc and explicit Gerschgorin bounds computed by the routine Compute Gerschgorin èsee Table 9è. For instance, the error bound of bisect for symmetric tridiagonal matrices is at most ë2fè2:5;"è+"ëkt k 2 + M ç 2 =æ+3!, with Assumption 2Cèiè: M ç ç!=", wehave ç ë2fè2:5;"è+"ëkt k 2 + M ç 2 M =æ+3! ç è2 æ 2:5 æ 1:06" + "èkt k 2 + ç M æ ç +3" M ç 7" æ bnorm + " æ bnorm +3" æ bnorm =11" æ bnorm: According to Table 9, if we let gl = gl, è10n +6è" æ bnorm; gu = gu + è10n +6è" æ bnorm: è5.9è Then we have FloatingCountèglè = 0; FloatingCountèguè = n: in all situations, which shows the Gerschgorin Bound è 5.9è is correct for Eispack's bisect, Lapack's sstebz, IEEE routine and Best Prescaling Algorithm. 5.8 The Splitting Criterion The splitting criterion asks if an oædiagonal entry b i is small enough in magnitude to set to zero without making unacceptable perturbations in the eigenvalues. This is useful because setting b i to zero splits T into two independent subproblems which can be solved faster 25

On the Error Analysis and Implementation of Some. Eigenvalue Decomposition and Singular Value. Decomposition Algorithms. Huan Ren

On the Error Analysis and Implementation of Some. Eigenvalue Decomposition and Singular Value. Decomposition Algorithms. Huan Ren On the Error Analysis and Implementation of Some Eigenvalue Decomposition and Singular Value Decomposition Algorithms by Huan Ren B.S. èpeking University, P.R.Chinaè 1991 A dissertation submitted in partial

More information

An Eæcient Tridiagonal Eigenvalue Solver. Ren-Cang Li and Huan Ren.

An Eæcient Tridiagonal Eigenvalue Solver. Ren-Cang Li and Huan Ren. An Eæcient Tridiagonal Eigenvalue Solver on CM 5 with Laguerre's Iteration Ren-Cang Li and Huan Ren Department of Mathematics University of California at Berkeley Berkeley, California 94720 li@math.berkeley.edu

More information

Deænition 1 A set S is ænite if there exists a number N such that the number of elements in S èdenoted jsjè is less than N. If no such N exists, then

Deænition 1 A set S is ænite if there exists a number N such that the number of elements in S èdenoted jsjè is less than N. If no such N exists, then Morphology of Proof An introduction to rigorous proof techniques Craig Silverstein September 26, 1998 1 Methodology of Proof an example Deep down, all theorems are of the form if A then B. They may be

More information

In the previous chapters we have presented synthesis methods for optimal H 2 and

In the previous chapters we have presented synthesis methods for optimal H 2 and Chapter 8 Robust performance problems In the previous chapters we have presented synthesis methods for optimal H 2 and H1 control problems, and studied the robust stabilization problem with respect to

More information

University of California, Berkeley. ABSTRACT

University of California, Berkeley. ABSTRACT Jagged Bite Problem NP-Complete Construction Kris Hildrum and Megan Thomas Computer Science Division University of California, Berkeley fhildrum,mctg@cs.berkeley.edu ABSTRACT Ahyper-rectangle is commonly

More information

CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 1999

CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 1999 CS 294-2, Visual Grouping and Recognition èprof. Jitendra Malikè Sept. 8, 999 Lecture è6 èbayesian estimation and MRFè DRAFT Note by Xunlei Wu æ Bayesian Philosophy æ Markov Random Fields Introduction

More information

A.V. SAVKIN AND I.R. PETERSEN uncertain systems in which the uncertainty satisæes a certain integral quadratic constraint; e.g., see ë5, 6, 7ë. The ad

A.V. SAVKIN AND I.R. PETERSEN uncertain systems in which the uncertainty satisæes a certain integral quadratic constraint; e.g., see ë5, 6, 7ë. The ad Journal of Mathematical Systems, Estimation, and Control Vol. 6, No. 3, 1996, pp. 1í14 cæ 1996 Birkhíauser-Boston Robust H 1 Control of Uncertain Systems with Structured Uncertainty æ Andrey V. Savkin

More information

W 1 æw 2 G + 0 e? u K y Figure 5.1: Control of uncertain system. For MIMO systems, the normbounded uncertainty description is generalized by assuming

W 1 æw 2 G + 0 e? u K y Figure 5.1: Control of uncertain system. For MIMO systems, the normbounded uncertainty description is generalized by assuming Chapter 5 Robust stability and the H1 norm An important application of the H1 control problem arises when studying robustness against model uncertainties. It turns out that the condition that a control

More information

Abstract. We present a variation of Paige's algorithm for computing the generalized singular value

Abstract. We present a variation of Paige's algorithm for computing the generalized singular value 1 Computing the Generalized Singular Value Decomposition æ Zhaojun Bai y and James W. Demmel z Abstract We present a variation of Paige's algorithm for computing the generalized singular value decomposition

More information

LECTURE Review. In this lecture we shall study the errors and stability properties for numerical solutions of initial value.

LECTURE Review. In this lecture we shall study the errors and stability properties for numerical solutions of initial value. LECTURE 24 Error Analysis for Multi-step Methods 1. Review In this lecture we shall study the errors and stability properties for numerical solutions of initial value problems of the form è24.1è dx = fèt;

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices.

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices. Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices. A.M. Matsekh E.P. Shurina 1 Introduction We present a hybrid scheme for computing singular vectors

More information

also has x æ as a local imizer. Of course, æ is typically not known, but an algorithm can approximate æ as it approximates x æ èas the augmented Lagra

also has x æ as a local imizer. Of course, æ is typically not known, but an algorithm can approximate æ as it approximates x æ èas the augmented Lagra Introduction to sequential quadratic programg Mark S. Gockenbach Introduction Sequential quadratic programg èsqpè methods attempt to solve a nonlinear program directly rather than convert it to a sequence

More information

A New Invariance Property of Lyapunov Characteristic Directions S. Bharadwaj and K.D. Mease Mechanical and Aerospace Engineering University of Califor

A New Invariance Property of Lyapunov Characteristic Directions S. Bharadwaj and K.D. Mease Mechanical and Aerospace Engineering University of Califor A New Invariance Property of Lyapunov Characteristic Directions S. Bharadwaj and K.D. Mease Mechanical and Aerospace Engineering University of California, Irvine, California, 92697-3975 Email: sanjay@eng.uci.edu,

More information

1 Floating point arithmetic

1 Floating point arithmetic Introduction to Floating Point Arithmetic Floating point arithmetic Floating point representation (scientific notation) of numbers, for example, takes the following form.346 0 sign fraction base exponent

More information

the Unitary Polar Factor æ Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 94720

the Unitary Polar Factor æ Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 94720 Relative Perturbation Bounds for the Unitary Polar Factor Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 9470 èli@math.berkeley.eduè July 5, 994 Computer

More information

322 HENDRA GUNAWAN AND MASHADI èivè kx; y + zk çkx; yk + kx; zk: The pair èx; kæ; ækè is then called a 2-normed space. A standard example of a 2-norme

322 HENDRA GUNAWAN AND MASHADI èivè kx; y + zk çkx; yk + kx; zk: The pair èx; kæ; ækè is then called a 2-normed space. A standard example of a 2-norme SOOCHOW JOURNAL OF MATHEMATICS Volume 27, No. 3, pp. 321-329, July 2001 ON FINITE DIMENSIONAL 2-NORMED SPACES BY HENDRA GUNAWAN AND MASHADI Abstract. In this note, we shall study ænite dimensional 2-normed

More information

ECS 120 Lesson 23 The Class P

ECS 120 Lesson 23 The Class P ECS 120 Lesson 23 The Class P Oliver Kreylos Wednesday, May 23th, 2001 We saw last time how to analyze the time complexity of Turing Machines, and how to classify languages into complexity classes. We

More information

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane. Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 3 Lecture 3 3.1 General remarks March 4, 2018 This

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

P.B. Stark. January 29, 1998

P.B. Stark. January 29, 1998 Statistics 210B, Spring 1998 Class Notes P.B. Stark stark@stat.berkeley.edu www.stat.berkeley.eduèçstarkèindex.html January 29, 1998 Second Set of Notes 1 More on Testing and Conædence Sets See Lehmann,

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

the Unitary Polar Factor æ Ren-Cang Li P.O. Box 2008, Bldg 6012

the Unitary Polar Factor æ Ren-Cang Li P.O. Box 2008, Bldg 6012 Relative Perturbation Bounds for the Unitary Polar actor Ren-Cang Li Mathematical Science Section Oak Ridge National Laboratory P.O. Box 2008, Bldg 602 Oak Ridge, TN 3783-6367 èli@msr.epm.ornl.govè LAPACK

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Divide-and-Conquer Algorithms Part Two

Divide-and-Conquer Algorithms Part Two Divide-and-Conquer Algorithms Part Two Recap from Last Time Divide-and-Conquer Algorithms A divide-and-conquer algorithm is one that works as follows: (Divide) Split the input apart into multiple smaller

More information

5 + 9(10) + 3(100) + 0(1000) + 2(10000) =

5 + 9(10) + 3(100) + 0(1000) + 2(10000) = Chapter 5 Analyzing Algorithms So far we have been proving statements about databases, mathematics and arithmetic, or sequences of numbers. Though these types of statements are common in computer science,

More information

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract What Every Programmer Should Know About Floating-Point Arithmetic Last updated: November 3, 2014 Abstract The article provides simple answers to the common recurring questions of novice programmers about

More information

The geometric mean algorithm

The geometric mean algorithm The geometric mean algorithm Rui Ralha Centro de Matemática Universidade do Minho 4710-057 Braga, Portugal email: r ralha@math.uminho.pt Abstract Bisection (of a real interval) is a well known algorithm

More information

The Inductive Proof Template

The Inductive Proof Template CS103 Handout 24 Winter 2016 February 5, 2016 Guide to Inductive Proofs Induction gives a new way to prove results about natural numbers and discrete structures like games, puzzles, and graphs. All of

More information

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS FLOATING POINT ARITHMETHIC - ERROR ANALYSIS Brief review of floating point arithmetic Model of floating point arithmetic Notation, backward and forward errors 3-1 Roundoff errors and floating-point arithmetic

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

QUIVERS AND LATTICES.

QUIVERS AND LATTICES. QUIVERS AND LATTICES. KEVIN MCGERTY We will discuss two classification results in quite different areas which turn out to have the same answer. This note is an slightly expanded version of the talk given

More information

An Explanation of the MRRR algorithm to compute eigenvectors of symmetric tridiagonal matrices

An Explanation of the MRRR algorithm to compute eigenvectors of symmetric tridiagonal matrices An Explanation of the MRRR algorithm to compute eigenvectors of symmetric tridiagonal matrices Beresford N. Parlett Departments of Mathematics and Computer Science Division, EECS dept. University of California,

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Gaussian Elimination for Linear Systems

Gaussian Elimination for Linear Systems Gaussian Elimination for Linear Systems Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 3, 2011 1/56 Outline 1 Elementary matrices 2 LR-factorization 3 Gaussian elimination

More information

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl In: Structural and Multidisciplinary Optimization, N. Olhoæ and G. I. N. Rozvany eds, Pergamon, 1995, 543í548. BOUNDS FOR DETECTABILITY OF MATERIAL'S DAMAGE BY NOISY ELECTRICAL MEASUREMENTS Elena CHERKAEVA

More information

Notes for Lecture Notes 2

Notes for Lecture Notes 2 Stanford University CS254: Computational Complexity Notes 2 Luca Trevisan January 11, 2012 Notes for Lecture Notes 2 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation

More information

The Phase Transition 55. have a peculiar gravitation in which the probability of merging is

The Phase Transition 55. have a peculiar gravitation in which the probability of merging is The Phase Transition 55 from to + d there is a probability c 1 c 2 d that they will merge. Components have a peculiar gravitation in which the probability of merging is proportional to their sizes. With

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)

More information

easy to make g by æ èp,1è=q where æ generates Zp. æ We can use a secure prime modulus p such that èp, 1è=2q is also prime or each prime factor of èp,

easy to make g by æ èp,1è=q where æ generates Zp. æ We can use a secure prime modulus p such that èp, 1è=2q is also prime or each prime factor of èp, Additional Notes to ëultimate Solution to Authentication via Memorable Password" May 1, 2000 version Taekyoung Kwon æ May 25, 2000 Abstract This short letter adds informative discussions to our previous

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

Proof Techniques (Review of Math 271)

Proof Techniques (Review of Math 271) Chapter 2 Proof Techniques (Review of Math 271) 2.1 Overview This chapter reviews proof techniques that were probably introduced in Math 271 and that may also have been used in a different way in Phil

More information

1. Introduction to commutative rings and fields

1. Introduction to commutative rings and fields 1. Introduction to commutative rings and fields Very informally speaking, a commutative ring is a set in which we can add, subtract and multiply elements so that the usual laws hold. A field is a commutative

More information

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d CS161, Lecture 4 Median, Selection, and the Substitution Method Scribe: Albert Chen and Juliana Cook (2015), Sam Kim (2016), Gregory Valiant (2017) Date: January 23, 2017 1 Introduction Last lecture, we

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

1 Finding and fixing floating point problems

1 Finding and fixing floating point problems Notes for 2016-09-09 1 Finding and fixing floating point problems Floating point arithmetic is not the same as real arithmetic. Even simple properties like associativity or distributivity of addition and

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

James Demmel 1. Courant Institute. New York, NY Kreçsimir Veseliçc. Lehrgebiet Mathematische Physik. Fernuniversitíat Hagen

James Demmel 1. Courant Institute. New York, NY Kreçsimir Veseliçc. Lehrgebiet Mathematische Physik. Fernuniversitíat Hagen Jacobi's method is more accurate than QR James Demmel 1 Courant Institute New York, NY 10012 Kresimir Veselic Lehrgebiet Mathematische Physik Fernuniversitíat Hagen D-5800 Hagen, West Germany Abstract

More information

Communication avoiding parallel algorithms for dense matrix factorizations

Communication avoiding parallel algorithms for dense matrix factorizations Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013

More information

A Polynomial-Time Algorithm for Pliable Index Coding

A Polynomial-Time Algorithm for Pliable Index Coding 1 A Polynomial-Time Algorithm for Pliable Index Coding Linqi Song and Christina Fragouli arxiv:1610.06845v [cs.it] 9 Aug 017 Abstract In pliable index coding, we consider a server with m messages and n

More information

1 Number Systems and Errors 1

1 Number Systems and Errors 1 Contents 1 Number Systems and Errors 1 1.1 Introduction................................ 1 1.2 Number Representation and Base of Numbers............. 1 1.2.1 Normalized Floating-point Representation...........

More information

1 Computational Problems

1 Computational Problems Stanford University CS254: Computational Complexity Handout 2 Luca Trevisan March 31, 2010 Last revised 4/29/2010 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation

More information

Lecture 10: Finite Differences for ODEs & Nonlinear Equations

Lecture 10: Finite Differences for ODEs & Nonlinear Equations Lecture 10: Finite Differences for ODEs & Nonlinear Equations J.K. Ryan@tudelft.nl WI3097TU Delft Institute of Applied Mathematics Delft University of Technology 21 November 2012 () Finite Differences

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations. POLI 7 - Mathematical and Statistical Foundations Prof S Saiegh Fall Lecture Notes - Class 4 October 4, Linear Algebra The analysis of many models in the social sciences reduces to the study of systems

More information

DIMACS Technical Report March Game Seki 1

DIMACS Technical Report March Game Seki 1 DIMACS Technical Report 2007-05 March 2007 Game Seki 1 by Diogo V. Andrade RUTCOR, Rutgers University 640 Bartholomew Road Piscataway, NJ 08854-8003 dandrade@rutcor.rutgers.edu Vladimir A. Gurvich RUTCOR,

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS FLOATING POINT ARITHMETHIC - ERROR ANALYSIS Brief review of floating point arithmetic Model of floating point arithmetic Notation, backward and forward errors Roundoff errors and floating-point arithmetic

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets

More information

Continuity of Bçezier patches. Jana Pçlnikovça, Jaroslav Plaçcek, Juraj ç Sofranko. Faculty of Mathematics and Physics. Comenius University

Continuity of Bçezier patches. Jana Pçlnikovça, Jaroslav Plaçcek, Juraj ç Sofranko. Faculty of Mathematics and Physics. Comenius University Continuity of Bezier patches Jana Plnikova, Jaroslav Placek, Juraj Sofranko Faculty of Mathematics and Physics Comenius University Bratislava Abstract The paper is concerned about the question of smooth

More information

DR.RUPNATHJI( DR.RUPAK NATH )

DR.RUPNATHJI( DR.RUPAK NATH ) Contents 1 Sets 1 2 The Real Numbers 9 3 Sequences 29 4 Series 59 5 Functions 81 6 Power Series 105 7 The elementary functions 111 Chapter 1 Sets It is very convenient to introduce some notation and terminology

More information

Chapter 12 Block LU Factorization

Chapter 12 Block LU Factorization Chapter 12 Block LU Factorization Block algorithms are advantageous for at least two important reasons. First, they work with blocks of data having b 2 elements, performing O(b 3 ) operations. The O(b)

More information

1. Introduction to commutative rings and fields

1. Introduction to commutative rings and fields 1. Introduction to commutative rings and fields Very informally speaking, a commutative ring is a set in which we can add, subtract and multiply elements so that the usual laws hold. A field is a commutative

More information

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1. Chapter 11 Min Cut By Sariel Har-Peled, December 10, 013 1 Version: 1.0 I built on the sand And it tumbled down, I built on a rock And it tumbled down. Now when I build, I shall begin With the smoke from

More information

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes

Computational Complexity. This lecture. Notes. Lecture 02 - Basic Complexity Analysis. Tom Kelsey & Susmit Sarkar. Notes Computational Complexity Lecture 02 - Basic Complexity Analysis Tom Kelsey & Susmit Sarkar School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ twk@st-andrews.ac.uk

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

Divisible Load Scheduling

Divisible Load Scheduling Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute

More information

CMPUT 675: Approximation Algorithms Fall 2014

CMPUT 675: Approximation Algorithms Fall 2014 CMPUT 675: Approximation Algorithms Fall 204 Lecture 25 (Nov 3 & 5): Group Steiner Tree Lecturer: Zachary Friggstad Scribe: Zachary Friggstad 25. Group Steiner Tree In this problem, we are given a graph

More information

The total current injected in the system must be zero, therefore the sum of vector entries B j j is zero, è1;:::1èb j j =: Excluding j by means of è1è

The total current injected in the system must be zero, therefore the sum of vector entries B j j is zero, è1;:::1èb j j =: Excluding j by means of è1è 1 Optimization of networks 1.1 Description of electrical networks Consider an electrical network of N nodes n 1 ;:::n N and M links that join them together. Each linkl pq = l k joints the notes numbered

More information

Block-tridiagonal matrices

Block-tridiagonal matrices Block-tridiagonal matrices. p.1/31 Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute

More information

Citation Osaka Journal of Mathematics. 43(2)

Citation Osaka Journal of Mathematics. 43(2) TitleIrreducible representations of the Author(s) Kosuda, Masashi Citation Osaka Journal of Mathematics. 43(2) Issue 2006-06 Date Text Version publisher URL http://hdl.handle.net/094/0396 DOI Rights Osaka

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Algorithms Notes for 2016-10-31 There are several flavors of symmetric eigenvalue solvers for which there is no equivalent (stable) nonsymmetric solver. We discuss four algorithmic ideas: the workhorse

More information

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an

pairs. Such a system is a reinforcement learning system. In this paper we consider the case where we have a distribution of rewarded pairs of input an Learning Canonical Correlations Hans Knutsson Magnus Borga Tomas Landelius knutte@isy.liu.se magnus@isy.liu.se tc@isy.liu.se Computer Vision Laboratory Department of Electrical Engineering Linkíoping University,

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Practicality of Large Scale Fast Matrix Multiplication

Practicality of Large Scale Fast Matrix Multiplication Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by

More information

Block Lanczos Tridiagonalization of Complex Symmetric Matrices

Block Lanczos Tridiagonalization of Complex Symmetric Matrices Block Lanczos Tridiagonalization of Complex Symmetric Matrices Sanzheng Qiao, Guohong Liu, Wei Xu Department of Computing and Software, McMaster University, Hamilton, Ontario L8S 4L7 ABSTRACT The classic

More information

On the Effectiveness of Symmetry Breaking

On the Effectiveness of Symmetry Breaking On the Effectiveness of Symmetry Breaking Russell Miller 1, Reed Solomon 2, and Rebecca M Steiner 3 1 Queens College and the Graduate Center of the City University of New York Flushing NY 11367 2 University

More information

Lecture 4 : Introduction to Low-density Parity-check Codes

Lecture 4 : Introduction to Low-density Parity-check Codes Lecture 4 : Introduction to Low-density Parity-check Codes LDPC codes are a class of linear block codes with implementable decoders, which provide near-capacity performance. History: 1. LDPC codes were

More information

Rectangular Systems and Echelon Forms

Rectangular Systems and Echelon Forms CHAPTER 2 Rectangular Systems and Echelon Forms 2.1 ROW ECHELON FORM AND RANK We are now ready to analyze more general linear systems consisting of m linear equations involving n unknowns a 11 x 1 + a

More information

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before

More information

ECS 231 Computer Arithmetic 1 / 27

ECS 231 Computer Arithmetic 1 / 27 ECS 231 Computer Arithmetic 1 / 27 Outline 1 Floating-point numbers and representations 2 Floating-point arithmetic 3 Floating-point error analysis 4 Further reading 2 / 27 Outline 1 Floating-point numbers

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES

A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES A NOTE ON STRATEGY ELIMINATION IN BIMATRIX GAMES Donald E. KNUTH Department of Computer Science. Standford University. Stanford2 CA 94305. USA Christos H. PAPADIMITRIOU Department of Computer Scrence and

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

The min cost flow problem Course notes for Optimization Spring 2007

The min cost flow problem Course notes for Optimization Spring 2007 The min cost flow problem Course notes for Optimization Spring 2007 Peter Bro Miltersen February 7, 2007 Version 3.0 1 Definition of the min cost flow problem We shall consider a generalization of the

More information

Numerical Analysis and Computing

Numerical Analysis and Computing Numerical Analysis and Computing Lecture Notes #02 Calculus Review; Computer Artihmetic and Finite Precision; and Convergence; Joe Mahaffy, mahaffy@math.sdsu.edu Department of Mathematics Dynamical Systems

More information

A PRIMER ON SESQUILINEAR FORMS

A PRIMER ON SESQUILINEAR FORMS A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

Linear Algebra. Min Yan

Linear Algebra. Min Yan Linear Algebra Min Yan January 2, 2018 2 Contents 1 Vector Space 7 1.1 Definition................................. 7 1.1.1 Axioms of Vector Space..................... 7 1.1.2 Consequence of Axiom......................

More information

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing Lecture 1 Introduction Karl Henrik Johansson The importance, ubiquity, and complexity of embedded systems are growing tremendously thanks to the revolution in digital technology. This has created a need

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

New Minimal Weight Representations for Left-to-Right Window Methods

New Minimal Weight Representations for Left-to-Right Window Methods New Minimal Weight Representations for Left-to-Right Window Methods James A. Muir 1 and Douglas R. Stinson 2 1 Department of Combinatorics and Optimization 2 School of Computer Science University of Waterloo

More information