Symmetric rank-2k update on GPUs and/or multi-cores
|
|
- Candice Atkins
- 5 years ago
- Views:
Transcription
1 Symmetric rank-2k update on GPUs and/or multi-cores Assignment 2, 5DV050, Spring 2012 Due on May 4 (soft) or May 11 (hard) at Version Background and motivation Quoting from Beresford N. Parlett's classic book The Symmetric Eigenvalue Problem: Vibrations are everywhere [... ] and so too are the eigenvalues (or frequencies) associated with them. The concert-goer unconsciously analyzes the quivering of her eardrum, the spectroscopist identies the constituents of a gas by looking at eigenvalues, and in California the State Building Department requires that the natural frequencies of many buildings should lie outside the earthquake band. Indeed, as mathematical models invade more and more disciplines we can anticipate a demand for eigenvalue calculations in an ever richer variety of contexts. Beresford N. Parlett Eigenvalues are found everywhere and they can reveal important properties such as the resonance frequencies of a physical system. They can be used to nd commonalities in a set of data, e.g., images or documents, that enable eective lossy data compression or feature extraction. The applications of eigenvalues are numerous and span most disciplines. Regardless of the interpretation of the eigenvalues in a specic application, many eigenvalue problems eventually boil down to the fundamental problem of determining the eigenvalues and the associated eigenvectorsof a square (real or complex) matrix A. Sometimes all the eigenvalues and eigenvectors are needed but many times only a subset of them are of interest. Actually computing eigenvalues λ and eigenvectors x, hereafter referred to as eigenpairs {λ, x}, is, however, not a simple task. Using nite precision arithmetic, round-o errors might propagate and accumulate. If the problem itself is very sensitive to perturbations (the technical term is ill-conditioned) and/or if the numerical algorithm is not stable in the sense that small perturbations due to, e.g., round-o errors accumulate too much, then the computed eigenpairs might have little to do with the actual eigenpairs. Some of the primary goals of the research into algorithms for eigenvalue computations are the following: 1. Construct stable and ecient algorithms for matrices having various combinations of properties, e.g., real, complex, symmetric, etc. 2. Analyze the strengths and limitations of the algorithms in the presence of round-o errors. 1
2 3. Implement the algorithms on various parallel architectures and maximize the performance without sacricing the numerical stability of the algorithms. In the present assignment, we will look into the third goal and get a glimpse of the computational and parallel aspects of eigenvalue computations. We hope you nd it interesting! Sections 1 to 4 give an introduction to symmetric eigenvalue problems and nally denes the operation that you will implementthe symmetric rank-2k update. Sections 5 and 6 explain the operation in more detail and present a reference implementation as well as the op count for the operation. In Section 7, a recursive blocked algorithm is presented, and nally the assignment is explained in more details in Section 8. In principle, you can skip everything up to (but not including) Section 5 and still complete the assignment without problems. However, the rst sections provide important context that puts the operation in some perspective and partially explains its relevance in the greater scheme of things. We assume some prior experience with linear algebra, in particular the denitions of matrix multiplication and matrix transpose. This assignment does not, however, require advanced knowledge of linear algebra and everything you need can be learned rather quickly. Please contact us if you have trouble understanding the assignment specication and we will help you as best we can. 1.1 Real symmetric matrices An important class of matrices are the real and symmetric ones. A symmetric matrix A of size n-by-n has the property that the entry in the i-th row and j-th column, i.e., [A] ij or A(i, j) or a ij deping on how you want to denote it, is equal to the entry in the j-th row and i-th column, i.e., [A] ji. In geometric terms, the matrix is symmetric with respect to a reection along the northwestsoutheast diagonalthe so-called main diagonal of the matrix. Formally, a matrix A is symmetric if and only if [A] ij = [A] ji (1) for all i, j = 1, 2,..., n. In matrix notation, a matrix is symmetric if and only if where A T is the transpose of A dened by its entries A T = A, (2) [A T ] ij := [A] ji. (3) From a computational point of view, one can often exploit symmetry to save half of the storage and half of the arithmetic operations since only the upper or lower triangular part of the matrix need to be represented explicitly. In this assignment, we will look at an algorithm that saves half the operations but we will ignore the potential for saving memory. 1.2 The real symmetric eigenvalue problem The real symmetric eigenvalue problem consists of nding all eigenvalues λ and the corresponding eigenvectors x of a real symmetric matrix A of size n-by-n. Each of the n eigenpairs {λ, x} of A satises a linear equation of the form Ax = λx. (4) 2
3 A trivial solution to (4) is x = 0, but this is hardly interesting. Therefore, we require that the eigenvector x be non-zero. The geometric interpretation of this is as follows. The vector x is a vector in n-space represented in some basis. The matrix A represents a linear transformation (e.g., rotation, reection, scaling, etc) on this space and the matrixvector product Ax applies A to x and produces the image of x under the linear transformation encoded by A. What (4) says is that this image is equal to a simple scaling of x itself and so x and its image are parallel to each other. For example, one of the eigenpairs of the real symmetric matrix A := is {λ, x} where λ := 2 and 1 x := 0. 1 While it is dicult to compute the eigenpairs in general, it is straightforward to verify that a λ and x forms an eigenpair. For the example above we get Ax = = 2 2 = 0 = 2 0 = λx, which veries that {λ, x} is indeed an eigenpair. 2 Algorithm for solving the real symmetric eigenvalue problem When A is a large and dense matrix (n in the hundreds or thousands and most entries of A are non-zero), then the stable computation of all eigenpairs of A is typically broken up into three phases. First, the matrix A is reduced to a condensed form with much fewer non-zero entries in such a way that the eigenvalues of the condensed matrix are the same as those of A and the eigenvectors of the condensed matrix are related to those of A in a simple way. Next, the eigenpairs of the condensed matrix are computed, and, nally, its eigenvectors are transformed into the eigenvectors of A. Specically, the computation of all eigenpairs of a real symmetric matrix A consists of three steps: 1. Reduce A to an equivalent (the technical term is similar) tri-diagonal matrix T. 2. Compute the eigenpairs of T. 3. Transform the eigenvectors of T into the eigenvectors of A. Numerically, the second step is the most challenging and relies on iterative algorithms that may or may not converge. Many advanced techniques are used to improve the accuracy and speed of these algorithms. Computationally, however, the second step is almost negligible in terms of computation time, and the third step is expensive but more or less straightforward to implement with high eciency. Therefore, the most challenging step from a computational point of view is the initial reduction to tri-diagonal form. 3
4 3 Tri-diagonal reduction Since the tri-diagonal reduction is both a challenging and an expensive part of the symmetric eigenvalue solver, we will only consider this part of the algorithm in this assignment. However, the best tri-diagonal reduction algorithm used today is a bit too complicated to implement in this course. A tri-diagonal matrix has non-zero entries only on and directly below (rst sub-diagonal) and above (rst super-diagonal) the main diagonal of the matrix as illustrated below: It follows that a tri-diagonal matrix of size n-by-n has at most 3n 2 non-zero entries, and if the matrix is also symmetric, then the number of (non-redundant) non-zero entries reduces to 2n 1. The goal of tri-diagonal reduction is to start with a dense symmetric matrix A and up with an equivalent matrix T that is still symmetric but also tri-diagonal. In order to preserve the eigenvalues of A we are only allowed to perform what is called similarity transformations of A. A similarity transformation takes the form A P 1 AP, (5) where P is an n-by-n invertible matrix. To see that the eigenvalues are preserved by a similarity transformation, note that if {λ, x} is an eigenpair of A, then Ax = λx (P 1 AP )(P 1 x) = λ(p 1 x), which implies that {λ, P 1 x} is an eigenpair of the transformed (similar) matrix B := P 1 AP. The geometric interpretation of a similarity transformation is that both A and the transformed matrix B represent the same linear transformation but in dierent coordinate systems. The matrix P, then, encodes the (linear) coordinate transformation. In general, the transformation matrix P can be very ill-conditioned and as a consequence the computed tri-diagonal form of A can have very dierent eigenvalues compared to A. Moreover, a general similarity transformation does not necessarily preserve the symmetry property. To improve the numerical stability of the tri-diagonal reduction algorithm and preserve symmetry we restrict the set of allowed transformations to the nicest ones: The so-called orthogonal matrices. Each column of an orthogonal matrix has unit length and the columns are pairwise orthogonal (perpicular). It follows that the inverse, Q 1, of an orthogonal matrix Q is equal to its transpose, Q T, i.e., Q T Q = QQ T = I. An orthogonal similarity transformation, i.e., P is orthogonal, takes the form where Q (instead of P ) is an orthogonal matrix. A Q T AQ, (6) 4
5 4 Tri-diagonal reduction algorithm The tri-diagonal reduction algorithm relies on a fundamental type of orthogonal matrices known as Householder reections, named after the numerical linear algebra pioneer Alston Scott Householder. A Householder reection of order n is dened by a vector v of length n that is scaled such that the matrix Q := I vv T (7) becomes orthogonal. Using specially crafted reections, it is possible to systematically reduce A to tri-diagonal form by introducing zeros one column/row at a time. Without going into the detailsthey are not important for the sake of this assignment we can construct a vector v j dening a reection Q j such that when the matrix is updated according to the formula A Q T j AQ j, (8) then the j-th column/row of the matrix is in tri-diagonal form. By systematically reducing the columns/rows from left/top to the right/bottom we eventually up with a full tri-diagonal matrix. The matrix below illustrates the zero/non-zero (sparsity) pattern of A after the update (8) with j = 3 and n = 6. Q T 3 AQ 3 = Note that the matrix is tri-diagonal in its top left 3-by-3 sub-matrix and fully dense in its bottom right (n 3)-by-(n 3) sub-matrix. The special structure of the reection Q must be exploited in order to implement the update (8) eciently. As written, (8) requires two matrix multiplications of order n for a total of 4n 3 oating point operations (in addition to the cost of constructing Q, which is negligible in comparison). However, if we exploit the structure and apply the update as in. A Q T j AQ j = (I vv T )A(I vv T ) = (I vv T )(A Avv T ) = A Avv T vv T A + vv T Avv T = A wv T vw T (9) where we have introduced the auxiliary vector w := Av 1 2 vvt Av, (10) then the operation count of (8) is reduced to Θ(n 2 ). Again without going into the details, a signicant portion of the work in the blocked tridiagonal reduction algorithm lies in applying a generalized (blocked) version of the update (9) that takes the form A A W V T V W T. (11) 5
6 We call this type of update a symmetric rank-2k update. The name stems from the facts that A is symmetric and each term (W V T and V W T ) has rank k and thus both terms together have rank 2k and is a symmetric matrix. 5 Symmetric rank-2k update We have nally covered enough background material to put the topic of this assignment into some context. You do not have to understand a word of what has been written above unless you want to get the bigger picture and understand the relevance and origins of the symmetric rank-2k update. Your task is to implement and evaluate a parallel version of the symmetric rank-2k update on a GPU and/or a set of multi-core CPUs with shared memory. The symmetric rank-2k update is a part of the BLAS interface/library, which contains most of the fundamental linear algebra operations, and goes by the name of xsyr2k where x should be replaced by one of S, D, C, and Z for single precision real, double precision real, single precision complex, and double precision complex, respectively. We limit ourselves to the single precision real case and therefore from this point on we assume x = S. The corresponding BLAS subroutine SSYR2K is fairly general and implements the following variants of the update: C αab T + αba T + βc, (12) C αa T B + αb T A + βc. (13) Both α and β are (real) scalars. The matrix C is square and symmetric of size n-by-n. Only the upper or lower part of C is touched, deping on an argument passed to the function. In variant (12), both A and B are n-by-k matrices and in variant (13) they are both k-by-n. We limit ourselves to the variant (12). The prototype of SSYR2K in the FORTRAN binding of the BLAS library reads as follows: subroutine ssyr2k(uplo,trans,n,k,alpha,a,lda,b,ldb,beta,c,ldc) The arguments are described in detail in the ssyr2k.f source le of the reference BLAS implementation [2], but for convenience we also briey describe them below. uplo String that determines whether to touch the upper ('U') or the lower ('L') triangular part of C. trans String that chooses either variant (12) ('N') or variant (13) ('T'). n The order of the matrix C and either the number of rows or the number of columns of A and B, deping on trans. k Either the number of columns or the number of rows of A and B, deping on trans. alpha The scalar α. A The matrix A stored in column-major format. lda The column stride of A (so called leading dimension). 6
7 B The matrix B stored in column-major format. ldb The column stride of B. beta The scalar β. C The matrix C stored in column-major format. ldc The column stride of C. The operation is illustrated pictorially in Figure 1 for the case alpha=beta=1, trans='n', and uplo='u'. On the entry level, the update takes the form [C] ij β[c] ij + α k [A] is [B] js + α However, we prefer the more compact and less cluttered form (12). s=1 k [B] is [A] js. (14) s=1 + + C C A B T B A T Figure 1: Graphical depiction of the xsyr2k operation with uplo='u' and trans='n'. 6 The reference implementation The reference implementation of SSYR2K [2] is structured as follows. We assume that α 0 since otherwise the operation degenerates into a simple scaling of C. We consider only the case uplo='u' and trans='n' as illustrated in Figure 1 since the other three cases are very similar. The upper triangular part of the matrix C is updated one column at a time from left to right. The column index j runs from 1 to n in an outer loop. The column is rst scaled in-place by the factor β (unless β = 1) as illustrated by the pseudo-code below: do i = 1 to j C(i,j) = beta * C(i,j) Next the j-th column of the two terms in (12) are applied to the j-th column of C in k steps as illustrated by the following pseudo-code: do s = 1 to k t1 = alpha * B(j,s) t2 = alpha * A(j,s) do i = 1 to j C(i,j) = C(i,j) + A(i,s) * t1 C(i,j) = C(i,j) + B(i,s) * t2 7
8 We can understand the snippet above by looking at (14) for one value of s at a time. In this context, a column of C is updated by adding a scaled column of A and a scaled column of B. The scaling factors correspond to the variables t1 and t2 above. If we glue the two pieces above together with the outer loop over columns we up with (a simplied version of) the reference implementation expressed in pseudo-code: do j = 1 to n do i = 1 to j C(i,j) = beta * C(i,j) do s = 1 to k t1 = alpha * B(j,s) t2 = alpha * A(j,s) do i = 1 to j C(i,j) = C(i,j) + A(i,s) * t1 C(i,j) = C(i,j) + B(i,s) * t2 Let us count the number of oating point operations (ops) performed by the pseudo-code above as a function f(n, k) of the parameters n and k. The initial scaling of C accounts for n j=1 i=1 The computations of t1 and t2 account for Finally, the inner loop accounts for n j 1 = n 2 /2 + n/2 ops. (15) n j=1 s=1 k j=1 s=1 i=1 In total, the op count of the xsyr2k operation is k 2 = 2kn ops. (16) j 4 = 2kn 2 + 2kn ops. (17) f(n, k) = (2k + 1/2)n 2 + 4kn + n/2 2kn 2. (18) Note that the op count grows rapidly with k, much faster than the amount of data (A, B, and C). Therefore, we expect that the operation can be implemented with a high computation to communication ratio (operational intensity) and thereby overcome the limited main memory bandwidth and the slow PCIe bus that connects the host to the GPU. In the next section, we will devise a blocked algorithm that relies heavily on matrix multiplication and should therefore be possible to implement eciently on GPUs and multi-cores alike. 8
9 7 Recursive blocked algorithm The reference implementation does not use the cache hierarchy eectively. In this section, we develop a recursive blocked algorithm that automatically adapts to all levels of the cache hierarchy. Let us partition the matrix C according to [ ] C11 C C =: 12 C12 T (19) C 22 such that C 11 and C 22 are square symmetric submatrices of order n 1 and n 2, respectively. Here n 1 and n 2 can be chosen arbitrarily but we prefer to choose either n 1 = b for some xed block size b, which results in a traditional blocked algorithm, or n 1 = n 2 = n/2 for a divide-and-conquer style recursive blocked algorithm. Next let us partition A and B into two row blocks each conformally with C. The rst row block consists of the rst n 1 rows and the other row block consists of the remaining n 2 rows. Using the block partitionings of A, B, and C the update (12) can now be written as [ ] [ ] [ ] C11 C 12 C11 C C12 T β 12 A1 [B1 C 22 C12 T + α T B T ] [ ] B1 [A1 C 2 + α T A T ] 2. (20) 22 A 2 On the block level, the update (20) decomposes into three (actually four, but one is redundant) block updates C 11 βc 11 + αa 1 B 1 T + αb 1 A 1 T, (21) C 22 βc 22 + αa 2 B 2 T + αb 2 A 2 T, (22) C 12 βc 12 + αa 1 B 2 T + αb 1 A 2 T. (23) If one looks at the updates and the properties of the blocks carefully enough, one will notice that the rst two block updates are also instances of the symmetric rank-2k update, only smaller, while the third block update is a sequence of two general matrix multiplication updates or xgemm operations. By applying the recursive template (20) to the two sub-operations (21) and (22) we obtain a recursive formulation of the xsyr2k operation. The op count of this recursive formulation is essentially the same as the entry-wise formulation analyzed in the previous section, i.e., 2kn 2. A strength of the recursive formulation is that at each level of the recursion it exposes two general matrix multiplication updates, or xgemm operations, which are regular operations that are well suited for high-performance implementation on both GPUs and multi-cores. This section has illustrated how rewriting an operation using recursion can expose highly desirable xgemm operations. It turns out that this technique can be applied to all of the fundamental matrix operations, as was rst demonstrated by Kågström, van Loan, and Ling with their GEMM-based BLAS project [3]. B 2 8 The assignment Read this section carefully. The requirements below are minimal requirements and they are deliberately a bit fuzzy. Use your own judgment to select suitable experiments and suitable ways of presenting/analyzing your results. 9
10 You should, given a piece of skeleton code, implement the SSYR2K operation variant with trans='n' and uplo='u' on either a GPU, a set of traditional multi-core processors, or a combination of the two. Base your implementation on the recursive blocked algorithm in its divide-and-conquer variant. The performance evaluation logic is already present in the skeleton code so you can concentrate on developing the computational kernels. For the sake of this assignment you should implement the SGEMM kernel(s) yourself instead of relying on any BLAS library. A minimal requirement is that you must use the caches (CPU) and shared memory (GPU) resources eectively. For your reference, a sequential implementation relying on the BLAS has been provided in the skeleton code. You are encouraged to implement a second version of your parallel code(s) that call the BLAS routines and compare the performance to your own code. Perform and report carefully chosen experiments. Use information about the cache sizes (CPU) and graphics memory size (GPU) to choose an appropriate range of parameters (n and k and block sizes). Hint: A reasonable choice in the context of tri-diagonal reduction would be n a few thousands and k at least 32 or so but perhaps as large as a few hundred. Some questions to think about: What is/are the optimal block size/sizes? How does k = 1, 2,... aect the performance? In the context of tri-diagonal reduction, k is a parameter that can be tweaked to improve performance. Is the code sensitive to NUMA eects? Can a tiled data layout improve performance? When should the recursion be aborted? Does it help to regularize the computation by computing the (small) SSYR2K operations in the recursive base case using SGEMM instead of SSYR2K? The SGEMM operation is presumably easier to optimize. References [1] The LAPACK interface specication and reference implementation. [2] The BLAS interface specication and reference implementation. [3] The GEMM-based BLAS by Kågström, van Loan, and Ling. 10
Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection
Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue
More informationSection 4.4 Reduction to Symmetric Tridiagonal Form
Section 4.4 Reduction to Symmetric Tridiagonal Form Key terms Symmetric matrix conditioning Tridiagonal matrix Similarity transformation Orthogonal matrix Orthogonal similarity transformation properties
More informationBLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product
Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8
More informationChapter 2. Matrix Arithmetic. Chapter 2
Matrix Arithmetic Matrix Addition and Subtraction Addition and subtraction act element-wise on matrices. In order for the addition/subtraction (A B) to be possible, the two matrices A and B must have the
More informationLinear Algebra and Eigenproblems
Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details
More information4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial
Linear Algebra (part 4): Eigenvalues, Diagonalization, and the Jordan Form (by Evan Dummit, 27, v ) Contents 4 Eigenvalues, Diagonalization, and the Jordan Canonical Form 4 Eigenvalues, Eigenvectors, and
More informationA Review of Matrix Analysis
Matrix Notation Part Matrix Operations Matrices are simply rectangular arrays of quantities Each quantity in the array is called an element of the matrix and an element can be either a numerical value
More informationLinear Algebra Review
Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite
More informationLecture 10: Eigenvectors and eigenvalues (Numerical Recipes, Chapter 11)
Lecture 1: Eigenvectors and eigenvalues (Numerical Recipes, Chapter 11) The eigenvalue problem, Ax= λ x, occurs in many, many contexts: classical mechanics, quantum mechanics, optics 22 Eigenvectors and
More informationLinear Algebra Review
Chapter 1 Linear Algebra Review It is assumed that you have had a beginning course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc I will review some of these terms here,
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationAlgorithms to solve block Toeplitz systems and. least-squares problems by transforming to Cauchy-like. matrices
Algorithms to solve block Toeplitz systems and least-squares problems by transforming to Cauchy-like matrices K. Gallivan S. Thirumalai P. Van Dooren 1 Introduction Fast algorithms to factor Toeplitz matrices
More informationImage Registration Lecture 2: Vectors and Matrices
Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationMatrix Algebra: Summary
May, 27 Appendix E Matrix Algebra: Summary ontents E. Vectors and Matrtices.......................... 2 E.. Notation.................................. 2 E..2 Special Types of Vectors.........................
More informationLinear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space
Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................
More informationA JACOBI-DAVIDSON ITERATION METHOD FOR LINEAR EIGENVALUE PROBLEMS. GERARD L.G. SLEIJPEN y AND HENK A. VAN DER VORST y
A JACOBI-DAVIDSON ITERATION METHOD FOR LINEAR EIGENVALUE PROBLEMS GERARD L.G. SLEIJPEN y AND HENK A. VAN DER VORST y Abstract. In this paper we propose a new method for the iterative computation of a few
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationRoundoff Error. Monday, August 29, 11
Roundoff Error A round-off error (rounding error), is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate
More informationLinear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)
Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to
More informationIntroduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim
Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its
More informationJim Lambers MAT 610 Summer Session Lecture 2 Notes
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the
More informationOutline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St
Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic
More informationMAA507, Power method, QR-method and sparse matrix representation.
,, and representation. February 11, 2014 Lecture 7: Overview, Today we will look at:.. If time: A look at representation and fill in. Why do we need numerical s? I think everyone have seen how time consuming
More informationCS 542G: Conditioning, BLAS, LU Factorization
CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles
More informationReduction of two-loop Feynman integrals. Rob Verheyen
Reduction of two-loop Feynman integrals Rob Verheyen July 3, 2012 Contents 1 The Fundamentals at One Loop 2 1.1 Introduction.............................. 2 1.2 Reducing the One-loop Case.....................
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More information[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]
Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the
More informationCourse Notes: Week 1
Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationDepartment of. Computer Science. Functional Implementations of. Eigensolver. December 15, Colorado State University
Department of Computer Science Analysis of Non-Strict Functional Implementations of the Dongarra-Sorensen Eigensolver S. Sur and W. Bohm Technical Report CS-9- December, 99 Colorado State University Analysis
More information1 Matrices and Systems of Linear Equations
Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More information1. The Polar Decomposition
A PERSONAL INTERVIEW WITH THE SINGULAR VALUE DECOMPOSITION MATAN GAVISH Part. Theory. The Polar Decomposition In what follows, F denotes either R or C. The vector space F n is an inner product space with
More informationContents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces
Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and
More informationM. Matrices and Linear Algebra
M. Matrices and Linear Algebra. Matrix algebra. In section D we calculated the determinants of square arrays of numbers. Such arrays are important in mathematics and its applications; they are called matrices.
More informationFast matrix algebra for dense matrices with rank-deficient off-diagonal blocks
CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices
More informationEigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis
Eigenvalue Problems Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalues also important in analyzing numerical methods Theory and algorithms apply
More informationLinear Algebra I. Ronald van Luijk, 2015
Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationproblem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e
A Parallel Solver for Extreme Eigenpairs 1 Leonardo Borges and Suely Oliveira 2 Computer Science Department, Texas A&M University, College Station, TX 77843-3112, USA. Abstract. In this paper a parallel
More informationThe amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.
AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are
More informationOrthogonal iteration to QR
Notes for 2016-03-09 Orthogonal iteration to QR The QR iteration is the workhorse for solving the nonsymmetric eigenvalue problem. Unfortunately, while the iteration itself is simple to write, the derivation
More informationSection 4.5 Eigenvalues of Symmetric Tridiagonal Matrices
Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete
More informationarxiv: v1 [math.na] 5 May 2011
ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and
More informationBoxlets: a Fast Convolution Algorithm for. Signal Processing and Neural Networks. Patrice Y. Simard, Leon Bottou, Patrick Haner and Yann LeCun
Boxlets: a Fast Convolution Algorithm for Signal Processing and Neural Networks Patrice Y. Simard, Leon Bottou, Patrick Haner and Yann LeCun AT&T Labs-Research 100 Schultz Drive, Red Bank, NJ 07701-7033
More informationLecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation
Lecture Cholesky method QR decomposition Terminology Linear systems: Eigensystems: Jacobi transformations QR transformation Cholesky method: For a symmetric positive definite matrix, one can do an LU decomposition
More informationEcon Slides from Lecture 7
Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 2 Systems of Linear Equations Copyright c 2001. Reproduction permitted only for noncommercial,
More informationA primer on matrices
A primer on matrices Stephen Boyd August 4, 2007 These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices to formulate and solve sets of simultaneous
More informationLinear Algebra (MATH ) Spring 2011 Final Exam Practice Problem Solutions
Linear Algebra (MATH 4) Spring 2 Final Exam Practice Problem Solutions Instructions: Try the following on your own, then use the book and notes where you need help. Afterwards, check your solutions with
More informationA primer on matrices
A primer on matrices Stephen Boyd August 4, 2007 These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices to formulate and solve sets of simultaneous
More informationFinal Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2
Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch
More informationNORTHERN ILLINOIS UNIVERSITY
ABSTRACT Name: Santosh Kumar Mohanty Department: Mathematical Sciences Title: Ecient Algorithms for Eigenspace Decompositions of Toeplitz Matrices Major: Mathematical Sciences Degree: Doctor of Philosophy
More informationElementary Linear Algebra
Matrices J MUSCAT Elementary Linear Algebra Matrices Definition Dr J Muscat 2002 A matrix is a rectangular array of numbers, arranged in rows and columns a a 2 a 3 a n a 2 a 22 a 23 a 2n A = a m a mn We
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationDirect methods for symmetric eigenvalue problems
Direct methods for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 4, 2008 1 Theoretical background Posing the question Perturbation theory
More informationKrylov Subspaces. Lab 1. The Arnoldi Iteration
Lab 1 Krylov Subspaces Lab Objective: Discuss simple Krylov Subspace Methods for finding eigenvalues and show some interesting applications. One of the biggest difficulties in computational linear algebra
More informationARPACK. Dick Kachuma & Alex Prideaux. November 3, Oxford University Computing Laboratory
ARPACK Dick Kachuma & Alex Prideaux Oxford University Computing Laboratory November 3, 2006 What is ARPACK? ARnoldi PACKage Collection of routines to solve large scale eigenvalue problems Developed at
More informationMath Spring 2011 Final Exam
Math 471 - Spring 211 Final Exam Instructions The following exam consists of three problems, each with multiple parts. There are 15 points available on the exam. The highest possible score is 125. Your
More informationKrylov Subspaces. The order-n Krylov subspace of A generated by x is
Lab 1 Krylov Subspaces Lab Objective: matrices. Use Krylov subspaces to find eigenvalues of extremely large One of the biggest difficulties in computational linear algebra is the amount of memory needed
More informationEssentials of Intermediate Algebra
Essentials of Intermediate Algebra BY Tom K. Kim, Ph.D. Peninsula College, WA Randy Anderson, M.S. Peninsula College, WA 9/24/2012 Contents 1 Review 1 2 Rules of Exponents 2 2.1 Multiplying Two Exponentials
More informationWhat is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A =
STUDENT S COMPANIONS IN BASIC MATH: THE ELEVENTH Matrix Reloaded by Block Buster Presumably you know the first part of matrix story, including its basic operations (addition and multiplication) and row
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten
More informationNumerical Linear Algebra
Numerical Linear Algebra By: David McQuilling; Jesus Caban Deng Li Jan.,31,006 CS51 Solving Linear Equations u + v = 8 4u + 9v = 1 A x b 4 9 u v = 8 1 Gaussian Elimination Start with the matrix representation
More informationNotes on Eigenvalues, Singular Values and QR
Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square
More informationPRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM
Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value
More informationA communication-avoiding thick-restart Lanczos method on a distributed-memory system
A communication-avoiding thick-restart Lanczos method on a distributed-memory system Ichitaro Yamazaki and Kesheng Wu Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract. The Thick-Restart
More information1 Matrices and Systems of Linear Equations
March 3, 203 6-6. Systems of Linear Equations Matrices and Systems of Linear Equations An m n matrix is an array A = a ij of the form a a n a 2 a 2n... a m a mn where each a ij is a real or complex number.
More information1 Matrices and Systems of Linear Equations. a 1n a 2n
March 31, 2013 16-1 16. Systems of Linear Equations 1 Matrices and Systems of Linear Equations An m n matrix is an array A = (a ij ) of the form a 11 a 21 a m1 a 1n a 2n... a mn where each a ij is a real
More informationMATRIX MULTIPLICATION AND INVERSION
MATRIX MULTIPLICATION AND INVERSION MATH 196, SECTION 57 (VIPUL NAIK) Corresponding material in the book: Sections 2.3 and 2.4. Executive summary Note: The summary does not include some material from the
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationEigenvalue problems and optimization
Notes for 2016-04-27 Seeking structure For the past three weeks, we have discussed rather general-purpose optimization methods for nonlinear equation solving and optimization. In practice, of course, we
More informationJim Lambers MAT 610 Summer Session Lecture 1 Notes
Jim Lambers MAT 60 Summer Session 2009-0 Lecture Notes Introduction This course is about numerical linear algebra, which is the study of the approximate solution of fundamental problems from linear algebra
More informationNumerical Linear Algebra
Numerical Linear Algebra The two principal problems in linear algebra are: Linear system Given an n n matrix A and an n-vector b, determine x IR n such that A x = b Eigenvalue problem Given an n n matrix
More informationLecture 6: Lies, Inner Product Spaces, and Symmetric Matrices
Math 108B Professor: Padraic Bartlett Lecture 6: Lies, Inner Product Spaces, and Symmetric Matrices Week 6 UCSB 2014 1 Lies Fun fact: I have deceived 1 you somewhat with these last few lectures! Let me
More informationA Divide-and-Conquer Algorithm for Functions of Triangular Matrices
A Divide-and-Conquer Algorithm for Functions of Triangular Matrices Ç. K. Koç Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report, June 1996 Abstract We propose
More information5 Eigenvalues and Diagonalization
Linear Algebra (part 5): Eigenvalues and Diagonalization (by Evan Dummit, 27, v 5) Contents 5 Eigenvalues and Diagonalization 5 Eigenvalues, Eigenvectors, and The Characteristic Polynomial 5 Eigenvalues
More informationLecture 4: Linear Algebra 1
Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation
More informationMAT 610: Numerical Linear Algebra. James V. Lambers
MAT 610: Numerical Linear Algebra James V Lambers January 16, 2017 2 Contents 1 Matrix Multiplication Problems 7 11 Introduction 7 111 Systems of Linear Equations 7 112 The Eigenvalue Problem 8 12 Basic
More informationNotes on vectors and matrices
Notes on vectors and matrices EE103 Winter Quarter 2001-02 L Vandenberghe 1 Terminology and notation Matrices, vectors, and scalars A matrix is a rectangular array of numbers (also called scalars), written
More informationPerformance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor
Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information
More informationLinear Algebra March 16, 2019
Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More informationLinear Algebra, Summer 2011, pt. 2
Linear Algebra, Summer 2, pt. 2 June 8, 2 Contents Inverses. 2 Vector Spaces. 3 2. Examples of vector spaces..................... 3 2.2 The column space......................... 6 2.3 The null space...........................
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More information