Solving Updated Systems of Linear Equations in Parallel

Similar documents
Review of matrices. Let m, n IN. A rectangle of numbers written like A =

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

Next topics: Solving systems of linear equations

Computation of the mtx-vec product based on storage scheme on vector CPUs

EE5120 Linear Algebra: Tutorial 1, July-Dec Solve the following sets of linear equations using Gaussian elimination (a)

Review of Basic Concepts in Linear Algebra

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x

MATRICES. a m,1 a m,n A =

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

ELA THE MINIMUM-NORM LEAST-SQUARES SOLUTION OF A LINEAR SYSTEM AND SYMMETRIC RANK-ONE UPDATES

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

1 Matrices and Systems of Linear Equations

Basic Concepts in Linear Algebra

9. Numerical linear algebra background

Solving Ax = b w/ different b s: LU-Factorization

Fraction-free Row Reduction of Matrices of Skew Polynomials

Homework 2 Foundations of Computational Math 2 Spring 2019

MATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

EA = I 3 = E = i=1, i k

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Review Questions REVIEW QUESTIONS 71

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],

A new interpretation of the integer and real WZ factorization using block scaled ABS algorithms

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

Factorization of singular integer matrices

Math 60. Rumbos Spring Solutions to Assignment #17

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Elementary Row Operations on Matrices

The Solution of Linear Systems AX = B

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Systems of n equations for n unknowns

CPE 310: Numerical Analysis for Engineers

Review of Matrices and Block Structures

Institute for Advanced Computer Studies. Department of Computer Science. On the Perturbation of. LU and Cholesky Factors. G. W.

Matrices and systems of linear equations

Parallel Numerical Algorithms

Numerical Linear Algebra

Solution of Linear Systems

9.1 - Systems of Linear Equations: Two Variables

Block-tridiagonal matrices

Gaussian Elimination -(3.1) b 1. b 2., b. b n

9. Numerical linear algebra background

Derivation of the Kalman Filter

ADDITIVE SCHWARZ FOR SCHUR COMPLEMENT 305 the parallel implementation of both preconditioners on distributed memory platforms, and compare their perfo

Chapter 3 - From Gaussian Elimination to LU Factorization

Applied Numerical Linear Algebra. Lecture 8

MAT 1332: CALCULUS FOR LIFE SCIENCES. Contents. 1. Review: Linear Algebra II Vectors and matrices Definition. 1.2.

Lecture 7: Introduction to linear systems

Matrices and Matrix Algebra.

Elementary Linear Algebra

Math 314 Lecture Notes Section 006 Fall 2006

1 Implementation (continued)

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

immediately, without knowledge of the jobs that arrive later The jobs cannot be preempted, ie, once a job is scheduled (assigned to a machine), it can

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Numerical Linear Algebra

Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of

Matrix decompositions

MATH2210 Notebook 2 Spring 2018

MATH 2050 Assignment 8 Fall [10] 1. Find the determinant by reducing to triangular form for the following matrices.

SOLVING LINEAR SYSTEMS

Scientific Computing: An Introductory Survey

Elementary maths for GMT

Math 344 Lecture # Linear Systems

Domain Decomposition Preconditioners for Spectral Nédélec Elements in Two and Three Dimensions

Section 3.5 LU Decomposition (Factorization) Key terms. Matrix factorization Forward and back substitution LU-decomposition Storage economization

Chapter 4. Solving Systems of Equations. Chapter 4

LS.1 Review of Linear Algebra

1. Introduction Let the least value of an objective function F (x), x2r n, be required, where F (x) can be calculated for any vector of variables x2r

SOLVING FUZZY LINEAR SYSTEMS BY USING THE SCHUR COMPLEMENT WHEN COEFFICIENT MATRIX IS AN M-MATRIX

Numerical Linear Algebra

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space

Algorithms to solve block Toeplitz systems and. least-squares problems by transforming to Cauchy-like. matrices

Math Camp Notes: Linear Algebra I

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable

Indefinite and physics-based preconditioning

Section 5.6. LU and LDU Factorizations

1 Last time: least-squares problems

Math 415 Exam I. Name: Student ID: Calculators, books and notes are not allowed!

The determinant. Motivation: area of parallelograms, volume of parallepipeds. Two vectors in R 2 : oriented area of a parallelogram

ECE133A Applied Numerical Computing Additional Lecture Notes

Matrix decompositions

MTH 464: Computational Linear Algebra

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

Gaussian Elimination and Back Substitution

Some notes on Linear Algebra. Mark Schmidt September 10, 2009

Ack: 1. LD Garcia, MTH 199, Sam Houston State University 2. Linear Algebra and Its Applications - Gilbert Strang

APPARC PaA3a Deliverable. ESPRIT BRA III Contract # Reordering of Sparse Matrices for Parallel Processing. Achim Basermannn.

UMIACS-TR July CS-TR 2494 Revised January An Updating Algorithm for. Subspace Tracking. G. W. Stewart. abstract

Chapter 2. Divide-and-conquer. 2.1 Strassen s algorithm

Dense LU factorization and its error analysis

The restarted QR-algorithm for eigenvalue computation of structured matrices

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Transcription:

Solving Updated Systems of Linear Equations in Parallel P. Blaznik a and J. Tasic b a Jozef Stefan Institute, Computer Systems Department Jamova 9, 1111 Ljubljana, Slovenia Email: polona.blaznik@ijs.si b Faculty of Elect. Eng. and Comp. Science, Ljubljana, Slovenia Technical Report CSD951 August 1995

Solving Updated Systems of Linear Equations in Parallel P. Blaznik a and J. Tasic b a Jozef Stefan Institute, Computer Systems Department Jamova 9, 1111 Ljubljana, Slovenia Email: polona.blaznik@ijs.si b Faculty of Elect. Eng. and Comp. Science, Ljubljana, Slovenia Abstract In this paper, updating algorithms for solving linear systems of equations are presented using a systolic array model. First, a parallel algorithm for computing the inverse of rankone modied matrix using the ShermanMorrison formula is proposed. This algorithm is then extended to solving the updated systems of linear equations on a linear systolic array. Finally, the generalisation to the updates of higher rank is shown. Keywords: Matrix updating, Linear systems, Systolic arrays 1 Introduction In many signal processing applications, we need to solve a sequence of linear systems in which each successive matrix is closely related to the pervious matrix. For example, we have to solve a recursive process where the matrix is modied by lowrank, typically rankone, updates at each iteration, i.e., A k = A k1 + u k1v T k1: Clearly, we should like to be able to solve the system A k x k = b by modifying A k1 and x k1 without computing a complete refactorisation of A k which is too costly. This work has been supported by Ministry of Science and Technology of the Republic of Slovenia under Grant Number J188. This report will be published in the Proc. of the Parallel Numerics 95 Workshop, Sorrento, Italy, September 9, 1995 Technical Report CSD951 August 1995

We choose systolic arrays (H.T. Kung and C. Leiserson, 198) as a parallel computing model to describe our algorithms. Despite the fact that so far systolic arrays have not really made their impact into many practical applications, the systolic description still reveals the fundamental parallelism which is available in an algorithm. Therefore, it provides useful information when this algorithm has to be implemented on an available parallel architecture. Updating techniques Techniques for updating matrix factorisations play an important role in modern linear algebra and optimisation. We often need to solve a sequence of linear systems in which each successive matrix is closely related to the previous matrix. By using A and A, systems of the form (A + A)x = b can be solved in time of order n ops, rather than order n ops. In this section, we rst restrict ourselves to the rankone modication. First, a systolic version of ShermanMorrison formula for computing the inverse of the modied matrix using the inverse of the original matrix is described. Then, we discuss its application in solving linear systems of equations with one or more righthand sides. Finally, we present the systolic array for the ranktwo modied inverse and solving ranktwo modied systems of equations as possible generalisations..1 Rankone modication When A is equal to A plus a rankone matrix uvt, A = A + uv T ; we say A is a rankone modication of A. Standard operations, such as, column and row replacement are special cases of rankone modication. Let A be an n n nonsingular matrix, and let u and v be nvectors. We want to nd the inverse matrix of the rankone modied matrix A = A + uv T. The matrix A + uv T is nonsingular if and only if 1 + v T A 1 u =. Its inverse is then (A + uv T ) 1 = A 1 1 1 + v T A 1 u A1 uv T A 1 : (1) This the well known ShermanMorrison formula for computing the inverse of the rankone modied matrix (Gill et al., 1991).

. Systolic algorithm { SASM To derive a systolic array for the evaluation of the ShermanMorrison formula (SASM), we would like to make use of already known systolic designs that solve some basic matrix problems. Let us dene the following matrix transformation on the compound matrix given below (Megson, 1991): A 11 A 1 5! MA 11 MA 1 MA 5! 11 MA 1 5 ; () A 1 A A 1 A A 1 + NMA 11 A + NMA 1 where M is selected so that the matrix MA 11 is triangular, and N is chosen to annihilate A 1. Applying the Faddeev algorithm (Faddeev and Faddeeva, 19), M and N can be easily constructed using elementary row operations on the compound matrix. It follows that A 1 = NMA 11 and thus N = A 1 A 1 11 M 1, so that the bottom right partition A + NMA 11 is given by A + A 1 A 1 11 A 1. Now we reformulate (1), using (), as a sequence of the following transformations: I A1 u 5! I A1 u 5 ; () v T 1 1 + v T A 1 u I A1 v T 1 + vt A 1 u v T A 1 A 1 u A 1 5! I A1 5 ; () v T A 1 5! 1 + vt A 1 u v T A 1 A 1 1 1+v A 1 u A1 uv T A 1 T 5 : (5) Equations () { () describe Gaussian elimination steps, where the multipliers v T are known in advance. Therefore, no explicit computation of multipliers is required in the array, and we do not need the part of the array concerned with the computation and pipelining of multipliers. Hence, a rectangular array of n (n + 1) inner product cells is sucient (Figure.).

Before describing the systolic array, we introduce the following representation: I A 1 u A 1 I A 1 u A 1 v T 1! 1 + v T A 1 u v T A 1 ; () 5 5 I A 1 u A 1 It is evident that the computation of () can be done on a n(n+1) rectangular array. The cells are IPS (inner product step) processors (Figure.1) accepting a multiplier from the left and updating the elements moving vertically. Each cell has two modes of operation, a load state and a normal computation state. During the load state, the matrices A 1 u and A 1 are input in a row ordered format, suitably delayed to allow synchronisation of the multipliers input on the left boundary. During that phase, the two matrices are loaded one element per cell, and become stationary. The next stage can be described in two phases. First, the vector [1... ] is input on the top boundary of the array, and v T on the left boundary. The components of v T are used as multipliers to compute 1 + v T A 1 u and v T A 1. The data is non stationary, and leaves the array on the south boundary. Second, the null matrix is input on the top boundary and matrix I on the left. This forces the computation of A 1 u and A 1. All the phases can be overlapped, so that the total computation time is T = (n + 1) + (n + 1) + n = n + inner product steps. x :=. for i=1 to total.time if init x.out := x + m.in*x.in x := x.in else x.out := x.in + m.in*x m:in x:in x m:in x:out Fig. 1. The PE denition for a IPS cell of rectangular array Once the transformation () is known, we use Gaussian elimination to evaluate the transformation (5). Because 1 + v T A 1 u is a scalar value, only a single column elimination is required. A linear array of n + 1 cells corresponding to one row of the triangular array for LU decomposition (Wan and Evans, 199 ) is sucient. Again, the cells have two modes of operation, a load state and a normal computation state. One cell is a divider cell, its function is described in Figure., others perform the same operations as cells of the rectangular part of the array (Figure.1). The delay through this extra array is a single elimination step. 5

x :=. for i=1 to total.time if init if x.in <>. m.out := (x/x.in) else m.out :=. x := x.in else if x <>. m.out := (x.in/x) else m.out :=. x:in Fig.. The PE denition for a divider cell. x m:in I v T I A 1 1 A 1 u Fig.. Systolic array SASM for ShermanMorrison formula (n = ). To sum up, the rankone modication of the matrix inverse using the Sherman Morrison formula can be computed on an (n + 1) (n + 1) mesh of cells (Figure.) in n + inner product steps.. Solving the updated linear systems Solving the updated systems of linear equations is a more important application than nding the inverse of the modied matrix. In this section, we will show how to use the equation (1) implicitly in solving the updated systems of equations without computing the inverse of the modied matrix.

Let A be a n n nonsingular matrix, and b a vector of dimension n. Let us assume we know the solution x of Ax = b: We want to nd a solution x of the rankone modied system (A + uv T )x = b: Now using the ShermanMorrison formula, it follows x = (A + uv T ) 1 b = (A 1 1 1 + v T A 1 u A1 uv T A 1 )b = A 1 b 1 1 + v T A 1 u A1 uv T A 1 b = x w 1 + v T w vt x; () where w is a solution of the system Aw = u. To derive the systolic array, we follow a similar procedure as before. We dene the following Gaussian transformations: I w 5! I w 5 ; (8) v T 1 1 + v T w I x 5! I x 5 ; (9) v T v T x 1 + vt w v T x 5! 1 + vt w v T x w x x 1+v w 5 : (1) w T vt x The evaluation of (8 9) can be performed on a n rectangular array of IPS cells (Figure.1). Since 1 + v T w is a scalar, we need to eliminate only one column. Therefore, the systolic array in Figure. of (n + 1) cells gives us the result in n + inner product steps Recall, that in general solving linear equations using the Faddeev array (Blaznik, 1995) takes 5n + 1 inner product steps on an array of n(n + 1)= + n cells. On some specic arrays this can be done faster but it is still not competitive with the array in Figure..

..1 Numerical example I v T I 1 x w x Fig.. Systolic array for updating linear systems (n = ). The algorithm was simulated using Occam. A numerical example for n = is given below. Given the linear system Ax = b where with known inverse 1 : 1 1 :8 A = ; b = ; x = 1 1 1 1: 5 5 5 1 : :8 : : : : 1: :8 : A 1 = : : :8 1: : 5 : : : :8 Then the solution of the system Ax = b, where 1 1 A = ; 1 1 5 1 8

which diers from A by the (1,) element, can be obtained from equation () and by choosing u T = 1 v T = and forming A = (A + uvt ) which results in x T = 1 : :5 1: :5 :.. Successive updates Our next aim is to perform successive updates. For example, if the changes in the matrix occur always in the same row, we can proceed as follows. On the kth step of the computation, we want to nd the solution x (k) of the system A (k) x (k) = b: We know the solution x (k1) of the system A (k1) x (k1) = b, and the relation between A (k1) and A (k), Using (), we can write x (k) = x (k1) A (k) = A (k1) + uv (k)t : 1 1 1 + v A (k)t (k1)1 u A(k1) uv (k) T x (k1) : On every step k, we therefore need the previous solution x (k1), the value of A (k1)1 u and the value of v (k)t. The array in Figure.5 can handle the successive rank one updates. It is important that all data arrives in the appropriate order. To assure this, we use so called switch cells introduced by D.J. Evans and C.R. Wan in (Evans and Wan, 199). They function as a data interface to rearrange the data which are results from the previous computation in the right order (Blaznik, 1995). The proposed data interface is shown in Figure.. The array needs the original data and the output from the previous iteration. The desired input is selected as shown in Figure. according to the processing phase of the cell. The second column of IPS cells is used for the evaluation of A (k)1 u on the kth step. The result is then fed back to the top of the array to be used at the next computation. 9

1 A (1)1 u A (1)1 u x (1) 1 A 1 u A 1 u x Iv (1)T I I v T I Fig. 5. Systolic array for successive updates of linear systems. for i=1 to n+1 x.out := x.in x:in for j=1 to no.of.updates1 for i=1 to n x.out := z.in for i=1 to n+1 x.out := x.in, sink z.in x:out. Ranktwo modication z:in Fig.. Data interface. The idea of rankone modication can be extended further to rankm modication. The result is the ShermanMorrisonWoodbury formula (Gill et al., 1991) (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1 ; (11) where U and V are n m matrices. It is obvious that when m = 1, this reduces to the ShermanMorrison formula (1) with I + V T A 1 U a scalar quantity. We want to derive a systolic version of the ShermanMorrisonWoodbury formula for ranktwo modication, i.e., m =. The transformations ()(5) are in this case the 1

following: I n A 1 U V T I I n A 1 V T 5! I n A 1 U I + V T A 1 U 5 ; (1) 5! I n A 1 5 ; (1) V T A 1 where U; V are n matrices. Then I + V T A 1 U V T A 1 A 1 U A 1 5! M(I + V T A 1 U) MV T A 1 A 1 A 1 U(I + V T A 1 U) 1 V T A 1 5 ; (1) where matrix M is chosen so that M(I + V T A 1 U) is an upper triangular matrix. The computation of equations (1) { (1) can be done by one transformation to the matrix of size (n + ) (n + ). I n A 1 U A 1 V T I 5 I n! I n A 1 U A 1 I + V T A 1 U V T A 1 : (15) 5 A 1 U A 1 These computations can be performed on a n (n + ) rectangular array. The cells are inner product step processors accepting multipliers from the left and updating the elements moving vertically (Figure.1). Since I + V T A 1 U is a matrix, then to evaluate the transformation (1) two column eliminations are required. Thus, we use the two rows of the triangular array for LU decomposition. We need two divider cells (their function is described in Figure.) with the other cells as IPS cells. The systolic array in Figure. computes the ranktwo modied inverse of a n n matrix A in n + inner product steps. 11

I V T I A 1 I A 1 U Fig.. Systolic array for rank modication of matrix inverse...1 Solving ranktwo modied linear systems Let A be a n n nonsingular matrix, and b a vector of dimension n. Let us assume we know the solution x of Ax = b: For solving the ranktwo modied system of linear equations Ax = (A + UV T )x = b; we need to evaluate the following transformations. I n A 1 U V T I 5! I n A 1 U I + V T A 1 U 5 ; (1) I n x 5! I n x 5 ; (1) V T V T x where U; V are n matrices. 1

Then I + V T A 1 U V T x 5! A 1 U x M(I + V T A 1 U) MV T x x A 1 U(I + V T A 1 U) 1 V T x 5 ; (18) where matrix M is chosen so that M(I + V T A 1 U) is an upper triangular matrix. The evaluation of equations (1) { (18) can be performed on an array of n+5 processors in n + inner product steps (Blaznik, 1995). Conclusions In this paper, we have presented some parallel updating techniques for solving rankone modied linear systems of equations. We have proposed a systolic algorithm for solving the rankone modied systems of linear equations. We have also described its generalisation to solving ranktwo modi ed systems. The algorithms were simulated on the Sequent Balance multiprocessor system. References [Bla95] P. Blaznik. Parallel Updating Methods in Multidimensional Filtering. PhD thesis, University of Ljubljana, 1995. [EW9] D.J. Evans and C.R. Wan. Systolic array for Schur complement computation. Intern. J. Computer Math., 8:15{11, 199. [FF] D.K. Faddeev and V.N. Faddeeva. W.H. Freeman and Company, 19. Computational methods of linear algebra. [GMW91] P.E. Gill, W. Murray, and M.H. Wright. Numerical linear algebra and optimization, volume 1. AddisonWesley, 1991. [KL8] H.T. Kung and C.E. Leiserson. Systolic arrays for VLSI. In I.A. Du and G.W. Stewart, editors, Proc. Sparse Matrix Symp., pages 5{8. SIAM, 198. [Meg91] G.M. Megson. Systolic rank updating and the solution of nonlinear equations. In Proc. 5 th International parallel processing symposium, pages {5. IEEE press, 1991. 1

[WE9] C. Wan and D.J. Evans. Systolic array architecture for linear and inverse matrix systems. Parallel Computing, 19:{, 199. 1