Solving Updated Systems of Linear Equations in Parallel P. Blaznik a and J. Tasic b a Jozef Stefan Institute, Computer Systems Department Jamova 9, 1111 Ljubljana, Slovenia Email: polona.blaznik@ijs.si b Faculty of Elect. Eng. and Comp. Science, Ljubljana, Slovenia Technical Report CSD951 August 1995
Solving Updated Systems of Linear Equations in Parallel P. Blaznik a and J. Tasic b a Jozef Stefan Institute, Computer Systems Department Jamova 9, 1111 Ljubljana, Slovenia Email: polona.blaznik@ijs.si b Faculty of Elect. Eng. and Comp. Science, Ljubljana, Slovenia Abstract In this paper, updating algorithms for solving linear systems of equations are presented using a systolic array model. First, a parallel algorithm for computing the inverse of rankone modied matrix using the ShermanMorrison formula is proposed. This algorithm is then extended to solving the updated systems of linear equations on a linear systolic array. Finally, the generalisation to the updates of higher rank is shown. Keywords: Matrix updating, Linear systems, Systolic arrays 1 Introduction In many signal processing applications, we need to solve a sequence of linear systems in which each successive matrix is closely related to the pervious matrix. For example, we have to solve a recursive process where the matrix is modied by lowrank, typically rankone, updates at each iteration, i.e., A k = A k1 + u k1v T k1: Clearly, we should like to be able to solve the system A k x k = b by modifying A k1 and x k1 without computing a complete refactorisation of A k which is too costly. This work has been supported by Ministry of Science and Technology of the Republic of Slovenia under Grant Number J188. This report will be published in the Proc. of the Parallel Numerics 95 Workshop, Sorrento, Italy, September 9, 1995 Technical Report CSD951 August 1995
We choose systolic arrays (H.T. Kung and C. Leiserson, 198) as a parallel computing model to describe our algorithms. Despite the fact that so far systolic arrays have not really made their impact into many practical applications, the systolic description still reveals the fundamental parallelism which is available in an algorithm. Therefore, it provides useful information when this algorithm has to be implemented on an available parallel architecture. Updating techniques Techniques for updating matrix factorisations play an important role in modern linear algebra and optimisation. We often need to solve a sequence of linear systems in which each successive matrix is closely related to the previous matrix. By using A and A, systems of the form (A + A)x = b can be solved in time of order n ops, rather than order n ops. In this section, we rst restrict ourselves to the rankone modication. First, a systolic version of ShermanMorrison formula for computing the inverse of the modied matrix using the inverse of the original matrix is described. Then, we discuss its application in solving linear systems of equations with one or more righthand sides. Finally, we present the systolic array for the ranktwo modied inverse and solving ranktwo modied systems of equations as possible generalisations..1 Rankone modication When A is equal to A plus a rankone matrix uvt, A = A + uv T ; we say A is a rankone modication of A. Standard operations, such as, column and row replacement are special cases of rankone modication. Let A be an n n nonsingular matrix, and let u and v be nvectors. We want to nd the inverse matrix of the rankone modied matrix A = A + uv T. The matrix A + uv T is nonsingular if and only if 1 + v T A 1 u =. Its inverse is then (A + uv T ) 1 = A 1 1 1 + v T A 1 u A1 uv T A 1 : (1) This the well known ShermanMorrison formula for computing the inverse of the rankone modied matrix (Gill et al., 1991).
. Systolic algorithm { SASM To derive a systolic array for the evaluation of the ShermanMorrison formula (SASM), we would like to make use of already known systolic designs that solve some basic matrix problems. Let us dene the following matrix transformation on the compound matrix given below (Megson, 1991): A 11 A 1 5! MA 11 MA 1 MA 5! 11 MA 1 5 ; () A 1 A A 1 A A 1 + NMA 11 A + NMA 1 where M is selected so that the matrix MA 11 is triangular, and N is chosen to annihilate A 1. Applying the Faddeev algorithm (Faddeev and Faddeeva, 19), M and N can be easily constructed using elementary row operations on the compound matrix. It follows that A 1 = NMA 11 and thus N = A 1 A 1 11 M 1, so that the bottom right partition A + NMA 11 is given by A + A 1 A 1 11 A 1. Now we reformulate (1), using (), as a sequence of the following transformations: I A1 u 5! I A1 u 5 ; () v T 1 1 + v T A 1 u I A1 v T 1 + vt A 1 u v T A 1 A 1 u A 1 5! I A1 5 ; () v T A 1 5! 1 + vt A 1 u v T A 1 A 1 1 1+v A 1 u A1 uv T A 1 T 5 : (5) Equations () { () describe Gaussian elimination steps, where the multipliers v T are known in advance. Therefore, no explicit computation of multipliers is required in the array, and we do not need the part of the array concerned with the computation and pipelining of multipliers. Hence, a rectangular array of n (n + 1) inner product cells is sucient (Figure.).
Before describing the systolic array, we introduce the following representation: I A 1 u A 1 I A 1 u A 1 v T 1! 1 + v T A 1 u v T A 1 ; () 5 5 I A 1 u A 1 It is evident that the computation of () can be done on a n(n+1) rectangular array. The cells are IPS (inner product step) processors (Figure.1) accepting a multiplier from the left and updating the elements moving vertically. Each cell has two modes of operation, a load state and a normal computation state. During the load state, the matrices A 1 u and A 1 are input in a row ordered format, suitably delayed to allow synchronisation of the multipliers input on the left boundary. During that phase, the two matrices are loaded one element per cell, and become stationary. The next stage can be described in two phases. First, the vector [1... ] is input on the top boundary of the array, and v T on the left boundary. The components of v T are used as multipliers to compute 1 + v T A 1 u and v T A 1. The data is non stationary, and leaves the array on the south boundary. Second, the null matrix is input on the top boundary and matrix I on the left. This forces the computation of A 1 u and A 1. All the phases can be overlapped, so that the total computation time is T = (n + 1) + (n + 1) + n = n + inner product steps. x :=. for i=1 to total.time if init x.out := x + m.in*x.in x := x.in else x.out := x.in + m.in*x m:in x:in x m:in x:out Fig. 1. The PE denition for a IPS cell of rectangular array Once the transformation () is known, we use Gaussian elimination to evaluate the transformation (5). Because 1 + v T A 1 u is a scalar value, only a single column elimination is required. A linear array of n + 1 cells corresponding to one row of the triangular array for LU decomposition (Wan and Evans, 199 ) is sucient. Again, the cells have two modes of operation, a load state and a normal computation state. One cell is a divider cell, its function is described in Figure., others perform the same operations as cells of the rectangular part of the array (Figure.1). The delay through this extra array is a single elimination step. 5
x :=. for i=1 to total.time if init if x.in <>. m.out := (x/x.in) else m.out :=. x := x.in else if x <>. m.out := (x.in/x) else m.out :=. x:in Fig.. The PE denition for a divider cell. x m:in I v T I A 1 1 A 1 u Fig.. Systolic array SASM for ShermanMorrison formula (n = ). To sum up, the rankone modication of the matrix inverse using the Sherman Morrison formula can be computed on an (n + 1) (n + 1) mesh of cells (Figure.) in n + inner product steps.. Solving the updated linear systems Solving the updated systems of linear equations is a more important application than nding the inverse of the modied matrix. In this section, we will show how to use the equation (1) implicitly in solving the updated systems of equations without computing the inverse of the modied matrix.
Let A be a n n nonsingular matrix, and b a vector of dimension n. Let us assume we know the solution x of Ax = b: We want to nd a solution x of the rankone modied system (A + uv T )x = b: Now using the ShermanMorrison formula, it follows x = (A + uv T ) 1 b = (A 1 1 1 + v T A 1 u A1 uv T A 1 )b = A 1 b 1 1 + v T A 1 u A1 uv T A 1 b = x w 1 + v T w vt x; () where w is a solution of the system Aw = u. To derive the systolic array, we follow a similar procedure as before. We dene the following Gaussian transformations: I w 5! I w 5 ; (8) v T 1 1 + v T w I x 5! I x 5 ; (9) v T v T x 1 + vt w v T x 5! 1 + vt w v T x w x x 1+v w 5 : (1) w T vt x The evaluation of (8 9) can be performed on a n rectangular array of IPS cells (Figure.1). Since 1 + v T w is a scalar, we need to eliminate only one column. Therefore, the systolic array in Figure. of (n + 1) cells gives us the result in n + inner product steps Recall, that in general solving linear equations using the Faddeev array (Blaznik, 1995) takes 5n + 1 inner product steps on an array of n(n + 1)= + n cells. On some specic arrays this can be done faster but it is still not competitive with the array in Figure..
..1 Numerical example I v T I 1 x w x Fig.. Systolic array for updating linear systems (n = ). The algorithm was simulated using Occam. A numerical example for n = is given below. Given the linear system Ax = b where with known inverse 1 : 1 1 :8 A = ; b = ; x = 1 1 1 1: 5 5 5 1 : :8 : : : : 1: :8 : A 1 = : : :8 1: : 5 : : : :8 Then the solution of the system Ax = b, where 1 1 A = ; 1 1 5 1 8
which diers from A by the (1,) element, can be obtained from equation () and by choosing u T = 1 v T = and forming A = (A + uvt ) which results in x T = 1 : :5 1: :5 :.. Successive updates Our next aim is to perform successive updates. For example, if the changes in the matrix occur always in the same row, we can proceed as follows. On the kth step of the computation, we want to nd the solution x (k) of the system A (k) x (k) = b: We know the solution x (k1) of the system A (k1) x (k1) = b, and the relation between A (k1) and A (k), Using (), we can write x (k) = x (k1) A (k) = A (k1) + uv (k)t : 1 1 1 + v A (k)t (k1)1 u A(k1) uv (k) T x (k1) : On every step k, we therefore need the previous solution x (k1), the value of A (k1)1 u and the value of v (k)t. The array in Figure.5 can handle the successive rank one updates. It is important that all data arrives in the appropriate order. To assure this, we use so called switch cells introduced by D.J. Evans and C.R. Wan in (Evans and Wan, 199). They function as a data interface to rearrange the data which are results from the previous computation in the right order (Blaznik, 1995). The proposed data interface is shown in Figure.. The array needs the original data and the output from the previous iteration. The desired input is selected as shown in Figure. according to the processing phase of the cell. The second column of IPS cells is used for the evaluation of A (k)1 u on the kth step. The result is then fed back to the top of the array to be used at the next computation. 9
1 A (1)1 u A (1)1 u x (1) 1 A 1 u A 1 u x Iv (1)T I I v T I Fig. 5. Systolic array for successive updates of linear systems. for i=1 to n+1 x.out := x.in x:in for j=1 to no.of.updates1 for i=1 to n x.out := z.in for i=1 to n+1 x.out := x.in, sink z.in x:out. Ranktwo modication z:in Fig.. Data interface. The idea of rankone modication can be extended further to rankm modication. The result is the ShermanMorrisonWoodbury formula (Gill et al., 1991) (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1 ; (11) where U and V are n m matrices. It is obvious that when m = 1, this reduces to the ShermanMorrison formula (1) with I + V T A 1 U a scalar quantity. We want to derive a systolic version of the ShermanMorrisonWoodbury formula for ranktwo modication, i.e., m =. The transformations ()(5) are in this case the 1
following: I n A 1 U V T I I n A 1 V T 5! I n A 1 U I + V T A 1 U 5 ; (1) 5! I n A 1 5 ; (1) V T A 1 where U; V are n matrices. Then I + V T A 1 U V T A 1 A 1 U A 1 5! M(I + V T A 1 U) MV T A 1 A 1 A 1 U(I + V T A 1 U) 1 V T A 1 5 ; (1) where matrix M is chosen so that M(I + V T A 1 U) is an upper triangular matrix. The computation of equations (1) { (1) can be done by one transformation to the matrix of size (n + ) (n + ). I n A 1 U A 1 V T I 5 I n! I n A 1 U A 1 I + V T A 1 U V T A 1 : (15) 5 A 1 U A 1 These computations can be performed on a n (n + ) rectangular array. The cells are inner product step processors accepting multipliers from the left and updating the elements moving vertically (Figure.1). Since I + V T A 1 U is a matrix, then to evaluate the transformation (1) two column eliminations are required. Thus, we use the two rows of the triangular array for LU decomposition. We need two divider cells (their function is described in Figure.) with the other cells as IPS cells. The systolic array in Figure. computes the ranktwo modied inverse of a n n matrix A in n + inner product steps. 11
I V T I A 1 I A 1 U Fig.. Systolic array for rank modication of matrix inverse...1 Solving ranktwo modied linear systems Let A be a n n nonsingular matrix, and b a vector of dimension n. Let us assume we know the solution x of Ax = b: For solving the ranktwo modied system of linear equations Ax = (A + UV T )x = b; we need to evaluate the following transformations. I n A 1 U V T I 5! I n A 1 U I + V T A 1 U 5 ; (1) I n x 5! I n x 5 ; (1) V T V T x where U; V are n matrices. 1
Then I + V T A 1 U V T x 5! A 1 U x M(I + V T A 1 U) MV T x x A 1 U(I + V T A 1 U) 1 V T x 5 ; (18) where matrix M is chosen so that M(I + V T A 1 U) is an upper triangular matrix. The evaluation of equations (1) { (18) can be performed on an array of n+5 processors in n + inner product steps (Blaznik, 1995). Conclusions In this paper, we have presented some parallel updating techniques for solving rankone modied linear systems of equations. We have proposed a systolic algorithm for solving the rankone modied systems of linear equations. We have also described its generalisation to solving ranktwo modi ed systems. The algorithms were simulated on the Sequent Balance multiprocessor system. References [Bla95] P. Blaznik. Parallel Updating Methods in Multidimensional Filtering. PhD thesis, University of Ljubljana, 1995. [EW9] D.J. Evans and C.R. Wan. Systolic array for Schur complement computation. Intern. J. Computer Math., 8:15{11, 199. [FF] D.K. Faddeev and V.N. Faddeeva. W.H. Freeman and Company, 19. Computational methods of linear algebra. [GMW91] P.E. Gill, W. Murray, and M.H. Wright. Numerical linear algebra and optimization, volume 1. AddisonWesley, 1991. [KL8] H.T. Kung and C.E. Leiserson. Systolic arrays for VLSI. In I.A. Du and G.W. Stewart, editors, Proc. Sparse Matrix Symp., pages 5{8. SIAM, 198. [Meg91] G.M. Megson. Systolic rank updating and the solution of nonlinear equations. In Proc. 5 th International parallel processing symposium, pages {5. IEEE press, 1991. 1
[WE9] C. Wan and D.J. Evans. Systolic array architecture for linear and inverse matrix systems. Parallel Computing, 19:{, 199. 1