Efficient Parallelizations of Hermite and Smith Normal Form Algorithms

Size: px
Start display at page:

Download "Efficient Parallelizations of Hermite and Smith Normal Form Algorithms"

Transcription

1 Efficient Parallelizations of Hermite and Smith Normal Form Algorithms Gerold Jäger a, Clemens Wagner b a Computer Science Institute, University of Halle-Wittenberg, D Halle (Saale), Germany b denkwerk, Vogelsanger Straße 66, D Köln, Germany Abstract Hermite and Smith normal form are important forms of matrices used in linear algebra. These terms have many applications in group theory and number theory. As the entries of the matrix and of its corresponding transformation matrices can explode during the computation, it is a very difficult problem to compute the Hermite and Smith normal form of large dense matrices. The main problems of the computation are the large execution times and the memory requirements which might exceed the memory of one processor. To avoid these problems, we develop parallelizations of Hermite and Smith normal form algorithms. These are the first parallelizations of algorithms for computing the normal forms with corresponding transformation matrices, both over the rings Z and F[x]. We show that our parallel versions have good efficiency, i.e., by doubling the processes, the execution time is nearly halved. Furthermore, they succeed in computing normal forms of dense large example matrices over the rings Q[x], F 3 [x], and F 5 [x]. Key words: Hermite normal form, Smith normal form, parallelization. 1. Introduction A matrix in R m,n, where R is a commutative, integral and Euclidean ring with 1, with rank n is in Hermite normal form (HNF), if it is a lower triangular matrix, where all elements are smaller than the diagonal element of the same row. The definition can easily be generalized to rank r < n. It follows from Hermite [19] that you can obtain from an arbitrary matrix in R m,n the uniquely determined HNF by doing unimodular column operations. A matrix in R m,n with rank r is in Smith normal form (SNF), if it is a diagonal matrix with the first r diagonal elements being divisors of the next diagonal element and the last diagonal elements being zero. It follows from Smith [31] that you can addresses: jaegerg@informatik.uni-halle.de (Gerold Jäger), clemens.wagner@denkwerk.com (Clemens Wagner)

2 obtain from an arbitrary matrix in R m,n the uniquely determined SNF by doing unimodular row and column operations. Thus the SNF is a generalization of the HNF for both, row and column operations. As Smith and Hermite normal forms are the basic building blocks to solving linear equations over the integers, they are at one more level of complexity than linear algebra (where elimination is done over the reals). Furthermore, Hermite and Smith normal form play an important role in the theory of finite Abelian groups, the theory of finitely generated modules over principal ideal rings, system theory, number theory, and integer programming. For many applications, for example linear equations over the integers, the transformation matrices describing the unimodular operations are important as well. There are many algorithms for computing the HNF [1, 2, 6, 14, 24], and SNF [2, 13, 15], most of them only for the ring Z. Some of these algorithms are probabilistic ([10] for the SNF and R = Z, [34] for the SNF and R = Q[x]). Deterministic algorithms for R = Z often use modular techniques ([23] for the HNF, [16, 32] for the SNF, [11], [30, Chapter 8.4], [33] for the HNF and SNF). Most of these modular algorithms are unable to compute the corresponding transformation matrices. Unfortunately, most algorithms lead to coefficient explosion, i.e., during the computation, entries of the matrix and the corresponding transformation matrices occur, which are very large, even exponential [8, 17]. For high-dimensional matrices with large entries this leads to large execution times and memory problems, i.e., the memory of one process is not large enough for the normal form computation of large matrices. These problems can be remedied by parallelization, so that it is possible to handle considerably larger matrices. Much effort has been done for matrix and linear algebra computations in parallel [3, 7, 9, 18, 26, 27, 29, 36]. In [21] a parallel HNF algorithm and in [21, 22, 37] parallel probabilistic SNF algorithms are introduced for the ring F[x], but without experimental results, and in [28] a parallel SNF algorithm is described which only works for characteristic matrices. The purpose of this paper is to show efficient parallelizations of Hermite and Smith normal form computations with empirical evidence. Especially we parallelize the well-known HNF and SNF algorithms of Kannan, Bachem [25] generalized to rectangular matrices with arbitrary rank, and the SNF algorithm of Hartley, Hawkes [12]. These are three of the most important algorithms which work for both the ring Z and the ring F[x] and which are able to compute the corresponding transformation matrices. It is an important problem of parallelization of normal forms, how to uniformly distribute a large matrix to many processes. Our main idea for this problem comes from the following observation which holds for all HNF and SNF algorithms considered: When an elimination step is done by series of column (row) operations, the operations depend only on one particular row (column). Thus it is reasonable to use the well-known row (column) distribution of matrices [9, 29]. Especially we use a row distribution for column operations and a column distribution for row operations, where row (column) distribution means distributing the matrix among the processes, so that each whole row (column) goes to a single process. This is done by a broadcast operation. When an elimination step is done involving entries in a particular row (column), that row (column) is broadcast to all the processes, 2

3 so that they can determine all in parallel what column (row) operations are to be done on the matrix. Then they update their rows (columns) by doing these column (row) operations on all their rows (columns). As for the SNF we use both row and column operations, an auxiliary algorithm is used, which transforms a row distributed matrix into a column distributed one and vice versa. This procedure is an implementation of parallel matrix transposition [4, 5, 35]. We estimate the parallel operations of the three algorithms and observe that the complexity of the parallel HNF algorithm is much better than that of both parallel SNF algorithms and that the parallel Hartley-Hawkes SNF algorithm leads to a better complexity than the parallel Kannan-Bachem SNF algorithm. We implement the algorithms and test it for large dense matrices over the rings Q[x], F 3 [x], and F 5 [x]. The experiments show that the parallel HNF algorithm and the parallel Kannan-Bachem SNF algorithm give very similar results. Comparing the SNF algorithms, the parallel Kannan-Bachem SNF algorithm leads to better results for the ring Q[x], and the parallel Hartley-Hawkes SNF algorithm to better results for the rings F 3 [x], F 5 [x]. Considering medium-sized matrices, we observe that the algorithms have a good efficiency, even for 64 processes. The algorithms are also able to compute the HNF and SNF with its corresponding transformation matrices of large matrices in reasonable time. Because of the memory requirements, the program packages MAGMA and MAPLE are not able to do most of such computations. 2. Preliminaries Let R be a commutative, integral ring with 1 and R R the set of units of R. Let R be Euclidean, i.e., there is a mapping φ : R \ {0} N 0, so that for a R, b R \ {0} q, r R exist with a = qb + r and r = 0 φ(r) < φ(b), where we define ψ(a, b) := r. Further let R R be a system of representatives of R, i.e., for each a R \ {0} unique e R and b R exist with a = e b, where we define β(a) := 1 e. In this paper we only consider two examples: a) The set Z of integers. We choose φ :=, R := N 0. For a R let a be the largest integer a. With the above notations we define for a Z, b Z \ {0} : ψ(a, b) := r = a a/b b and for a Z \ {0} we have: β(a) := sgn(a). For A Z m,n let A := max 1 i m,1 j n { A i,j }. b) The polynomial ring F[x] with a field F. We choose φ := deg, R := {Monic polynomials over F[x]}. With the above notations it holds for a F[x], b F[x] \ {0} : ψ(a, b) := r, where r is uniquely determined by polynomial division of a and b. Further for a F[x] \ {0} we have: β(a) := 1 a k for a = k a i x i, a k 0. For A F[x] m,n let A deg := max 1 i m,1 j n {deg(a i,j )}. i=0 Definition { 2.1. a) The matrix E n = (E ij ) 1 i,j n R n,n is defined by 1, if i = j E i,j = 0, otherwise. b) GL n (R) is the group of matrices in R n,n whose determinant is a unit in the ring R. These matrices are called unimodular matrices. 3

4 Definition 2.2. A matrix A = (A ij ) 1 i m,1 j n R m,n with rank r is in Hermite normal form (HNF), if the following conditions hold: a) i 1,..., i r with 1 i 1 < < i r m with A ij,j R \ 0 for 1 j r (the A ij,j are called pseudo diagonal elements). b) A i,j = 0 for 1 i i j 1, 1 j r. c) The columns r + 1,..., n are zero. d) A ij,l = ψ(a ij,l, A ij,j) for 1 l < j r. The matrix A is in left Hermite normal form (LHNF), if its transpose A T is in HNF. Theorem 2.3. [19] Let A R m,n. Then a matrix V GL n (R) exists, so that H = AV is in HNF. The matrix H is uniquely determined. The matrix V is called the corresponding transformation matrix for the HNF. Definition 2.4. A matrix A = (A ij ) 1 i m,1 j n R m,n with rank r is in Smith normal form (SNF), if the following conditions hold: a) A is a diagonal matrix. b) A i,i R \ {0} for 1 i r. c) A i,i A i+1,i+1 for 1 i r 1. d) A i,i = 0 for r + 1 i min{m, n}. Theorem 2.5. [31] Let A R m,n. Then matrices U GL m (R) and V GL n (R) exist, so that C = UAV is in SNF. The matrix C is uniquely determined. The matrices U, V are called the corresponding left hand and right hand transformation matrix for the SNF. 3. HNF and SNF algorithms The following algorithms compute the HNF and SNF of an arbitrary matrix in R m,n with corresponding transformation matrices. All algorithms are also formulated for the transformation matrices U and V. Normally, each algorithm starts with U = E m and V = E n, but if an algorithm is a subroutine of another algorithm, the settings of U GL m (R) and V GL n (R) come from the main algorithm HNF Algorithm column by column The following algorithm ROW-ONE-GCD (A, V, i, j, l) works on the i-th row of a matrix A. It does a unimodular transformation on two arbitrary entries A i,j and A i,l of this row, so that after the transformation the second entry is zero. Exactly it holds: A new i,j = gcd(a old i,j, Aold i,l ) R and Anew i,l = 0. In parallel the corresponding transformation matrix V can be computed. Gcd computation of two elements of the same row INPUT A = [a 1,..., a n] R m,n, V = [v 1,..., v n] GL n(r) and i, j, l with 1 i m, 1 j < l n 1 IF A i,j 0 A i,l 0 4

5 2 THEN Compute d := gcd (A i,j, A i,l ) and «u, v with d = ua i,j + va i,l u Ai,l /d 3 [a j, a l ] = [a j, a l ] v A i,j/d «u Ai,l /d 4 [v j, v l ] = [v j, v l ] v A i,j/d OUTPUT (A, V ) = ROW-ONE-GCD (A, V, i, j, l) In step 2, for two integers x, y we compute gcd(x, y) and u, v with gcd = ux + vy, using the Euclidean algorithm. In steps 3 and 4, a unimodular matrix is multiplied to the original matrix and to the transformation matrix V from the right, so that the j-th and l-th column are changed in such a way that the conditions for A i,j and A i,l are fulfilled. We denote the analogous algorithm for the gcd computation of two elements of the same column by COL-ONE-GCD (A, U, i, j, l). With these procedures we can formulate the HNF algorithm which we will parallelize. The HNF algorithm is based on the HNF algorithm of Kannan, Bachem [25] who proposed for a square matrix A R n,n of full rank to compute the HNF of a (t t) submatrix for t = 1,..., n recursively. This algorithm is also able to compute the corresponding transformation matrix, and because of its simple structure it is ideal for parallelization. A natural generalization of this algorithm for rectangular matrices with arbitrary rank is to recursively compute the HNF of the first t columns for t = 1,..., n. HNF computation column by column INPUT A = [a 1,..., a n] R m,n, V = [v 1,..., v n] GL n(r) 1 FOR t = 1,..., n (Compute HNF of first t columns) 2 r = 0 3 FOR s = 1,..., m 4 IF A s,r+1 0 A s,t 0 5 THEN r = r i r = s 7 IF t = r 8 THEN IF A s,t / R 9 THEN a t = β(a s,t) a t 10 v t = β(a s,t) v t 11 ELSE ROW-ONE-GCD (A, V, s, r, t) 12 FOR l = 1,..., r 1 13 a l = a l ψ(a s,l, A s,r) a r 14 v l = v l ψ(a s,l, A s,r) v r 15 IF t = r 16 THEN GOTO 1 with next t OUTPUT (A, V ) = HNF (A, V ) with rank r In steps 4 to 11, the current pseudo diagonal element A s,r is computed. After steps 12 to 14, the elements of the s-th row fulfill the condition d) of the HNF definition. If t = r holds after step 14, the current pseudo diagonal element A s,r is found and we can go to the next t of the FOR loop of step 1. 5

6 * 1 * * * 2 3 * * * 4 * 5 6 * * 2 3 * 1 * (a) (b) Figure 1: Order of reducing the elements left from the diagonal elements: (a) Standard, (b) Chou-Collins Obviously, there is an analogous algorithm for the computation of the LHNF row by row. Chou, Collins [6] found an essential theoretical and practical improvement for the HNF computation by reducing the elements left from the diagonal elements (steps 12 and 13 of Algorithm 3.1) in a different order (see Fig. 1). As the columns, which are added to reduce an element, have only reduced non-diagonal elements at this status, the algorithm leads to less coefficient explosion. Unfortunately, the Chou-Collins idea cannot be combined with HNF Algorithm 3.1 for efficient parallelization, as the communication operations between the processes become too large (see section 4.2) Algorithm DIAGTOSMITH In the SNF algorithms we use an elementary algorithm DIAGTOSMITH [30] which computes the SNF of a matrix in diagonal form. INPUT A R m,n in diagonal form, U = [u 1,..., u m] T GL m(r), V = [v 1,..., v n] GL n(r) 1 FOR k = 1,..., min{m, n} 1 2 FOR l = min{m, n} 1,..., k 3 IF A l,l A l+1,l+1 4 THEN g = A l,l A l+1,l+1 5 A l,l = gcd (A l,l, A l+1,l+1 ) 6 A l+1,l+1 = g/a l,l 7 Compute d := gcd (A l,l, A l+1,l+1 ) and u, v with d = ua l,l + va l+1,l+1 «8 [u l, u l+1 ] T = u A l+1,l+1 d 9 [v l, v l+1 ] = [v l, v l+1 ] 10 FOR l = 1,..., min{m, n} 11 IF A l,l 0 12 THEN A l,l = β(a l,l ) A l,l 13 v l = β(a l,l ) v l OUTPUT (A, U, V ) = DIAGTOSMITH (A, U, V ) v A l,l d 1 v Al+1,l+1 d 1 u Al,l d [u l, u l+1 ] T! In steps 4 to 6 two neighboring diagonal elements A l,l and A l+1,l+1 are substituted by its gcd and its lcm, so that after these steps it holds: A l,l A l+1,l+1. 6

7 The steps 7 to 9 hold because of the following equation [30]: ( ) ( ) ( u v Al,l 0 1 v A l+1,l+1 A d l+1,l+1 A l,l 0 A d d l+1,l+1 1 u A l,l d ( ) d 0 = 0 lcm(a l,l, A l+1,l+1 ) ) (1) These steps are repeated until the conditions c) and d) of the SNF definition are fulfilled. After the steps 10 to 13, also condition b) of the SNF definition is fulfilled. The algorithm needs not more than min{m, n} 2 gcd computations Kannan-Bachem SNF algorithm INPUT A R m,n, U GL m(r), V GL n(r) 1 WHILE (A is not in diagonal form) 2 (A, V ) = HNF (A, V ) 3 (A, U) = LHNF (A, U) 4 (A, U, V ) = DIAGTOSMITH (A, U, V ) OUTPUT (A, U, V ) = KB-SNF (A, U, V ) The algorithm of Kannan and Bachem [25] alternately computes the HNF and the LHNF in the steps 2 and 3, until the matrix is in diagonal form. In step 4, the algorithm DIAGTOSMITH is applied Hartley-Hawkes SNF algorithm For the algorithm of Hartley and Hawkes [12, p. 112] many variants are known, see [2, 13, 15]. They differ in the implementation of the following procedures ROWGCD and COLGCD and in some additional row and column swaps. For i, j with 1 i m, 1 j n, ROWGCD (A, V, i, j) transforms A, so that A new i,j = gcd (A old i,j, Aold i,j+1,..., Aold i,n ) R, Anew i,j+1 =... = Anew i,n = 0. This is done by subtracting the multiple of a column from another column very often. The corresponding transformation matrix V GL n (R) is changed by repeating all column operations applied to A also to V. The procedure COLGCD (A, U, i, j) is analogously defined with the role of rows and columns exchanged. INPUT A R m,n, U GL m(r), V GL n(r) 1 l = 1 2 WHILE l min{m, n} 3 IF NOT (A l,l+1:n ) = 0 4 THEN A = ROWGCD (A, V, l, l) 5 IF (A l+1:m,l ) = 0 6 THEN l = l ELSE A = COLGCD (A, U, l, l) 8 (A, U, V ) = DIAGTOSMITH (A, U, V ) OUTPUT (A, U, V ) = HH-SNF (A, U, V ) 7

8 For l = 1,..., min{m, n} the algorithm of Hartley, Hawkes alternately uses the procedures ROWGCD (A, V, l, l) and COLGCD (A, U, l, l) in steps 3 to 7, until the first l rows and columns have diagonal form. In step 8, again the algorithm DIAGTOSMITH is used. 4. Parallelization of the HNF and SNF algorithms 4.1. Idea of Parallelization A parallel program is a set of independent processes with data being interchanged between the processes. We write BROADCAST x, if a process sends a variable x to all other processes, BROADCAST-RECEIVE x FROM z, if a process receives a variable x from the process with the number z which it has sent with BROADCAST, SEND x TO z, if a process sends a variable x to the process with the number z, and SEND-RECEIVE x FROM z, if a process receives a variable x from the process with the number z which it has sent with SEND. The matrix whose HNF and SNF shall be computed has to be distributed to the different processes as uniformly as possible. It is straightforward to put different rows or different columns of a matrix to one process [9, 29]. For algorithms in which mainly column operations are used, a column distribution is not reasonable, as for column additions with multiplicity the columns involved in computations mostly belong to different processes, so that for each such computation at least one column element should be sent. Thus we distribute rows, if column operations are used, and columns, if row operations are used. As the SNF algorithms use both operations, we have to switch between both distributions (see the procedures PAR-ROWTOCOL and PAR-COLTOROW in Section 4.4). Quinn [29] considers two approaches for row distribution based on block data decomposition. As experiments in section 5.3 show that both approaches lead to rather similar execution times of our algorithms, we use the simpler one of both. Let the matrix Ā Rm,n be distributed on q processes and let the z-th process consist of k row (z) rows. Every process z has as input a matrix A R krow(z),n with q 1 z=0 k row(z) = m. For every process let the order of the rows be equal to the original order. At any time we can obtain the complete matrix by putting together these matrices. We choose the following uniform distribution. The process with the number z receives the rows z + 1, z q,..., z (m z 1)/q q (see Fig. 2(a) for an example). The most important point of this distribution is that each process receives rows from different parts of the whole matrix (compare section 5.3). For 1 l m, let ROW-TASK (l) return the number of the process, where the l-th row is (for example in Fig. 2(a): ROW-TASK (7) = 2). For 1 l m, let ROW-NUM (l) return the order of the original l-th row in the local list of rows in any process, even if the row l is not present in that list (for example in Fig. 2(a): ROW-NUM (7) = 3 on processes 0,1 and ROW-NUM (7) = 2 on processes 2,3) 8

9 Process Process Process Process 2 Process Process 3 Process Process Process Process Process Process Process Process Process distri- (b) Column bution (a) Row distribution (c) Broadcast of row 6 Figure 2: Example of a matrix with 11 rows, 9 columns for 4 processes Analogously, we define the column distribution, where the z-th process consists of k col (z) columns and every process z has as input a matrix A R m,k col(z) with q 1 z=0 k col(z) = n, and the functions COL-TASK and COL-NUM (see Fig. 2(b) for an example). Since only column operations are performed on V, we choose for V a row distribution. Analogously, a column distribution is chosen for U Parallel HNF algorithm column by column In the following we only consider the HNF computation of a row distributed matrix. Obviously, the LHNF computation of a column distributed matrix works analogously. Considering the FOR loop of step 3 of the HNF Algorithm 3.1 with s, we observe that the column operations only depend on the s-th row. Thus it is a good idea to send the s-th row to all processes, so that each process can execute its column operations. INPUT Cardinality of processes q, number of its own process z, A = [a 1,..., a k row(z)] T = [a 1,..., a n] R krow(z),n with P q 1 z=0 krow(z) = m, V = [v 1,..., v n] R kcol(z),n with P q 1 z=0 k col(z) = n 1 FOR t = 1,..., n (Compute HNF of first t columns) 2 r = 0 3 FOR s = 1,..., m 4 y = ROW-TASK (s) 5 h = ROW-NUM (s) 6 IF y = z 9

10 7 THEN BROADCAST vector a h 8 ELSE BROADCAST-RECEIVE vector g FROM y 9 Insert g as h-th row vector of A 10 IF A h,r+1 0 A h,t 0 11 THEN r = r i r = s 13 IF t = r 14 THEN IF A h,t / R 15 THEN a t = β(a h,t ) a t 16 v t = β(a h,t ) v t 17 ELSE ROW-ONE-GCD (A, V, h, r, t) 18 FOR l = 1,..., r 1 19 a l = a l ψ(a h,l, A h,r )a r 20 v l = v l ψ(a h,l, A h,r )v r 21 IF t = r 22 THEN IF NOT y = z 23 THEN Remove the h-th row vector of A 24 GOTO 1 with next t 25 IF NOT y = z 26 THEN Remove the h-th row vector of A OUTPUT (A, V ) = PAR-HNF (A, V ) with rank r Correctness: In this algorithm (and all following algorithms) the same steps are executed as in the corresponding original algorithm, but on the processes the current elements lie on. For constant t and for s = 1,..., m the s-th row is sent from its process to all other processes. There it is inserted in row h below all rows which are in the whole matrix above row s (see Fig. 2(a) and Fig. 2(c) for an example). On all processes the row h and the rows below row h are transformed according to the original algorithm. After that on all processes except the sending process, the received row is removed again. Theorem 4.1. Let p = max{m, n}. Algorithm 4.2 needs O(p 2 ) BROADCAST operations and O(p 3 ) ring elements are sent. Proof. Each BROADCAST operation appears in 2 nested FOR loops, where at most p ring elements are sent. It is easy to see that a parallel Chou-Collins version of HNF Algorithm 4.2 would need O(p 3 ) BROADCAST operations which is too much for efficient parallelization of this algorithm. For this reason, we have only parallelized the original HNF Algorithm Algorithm PAR-DIAGTOSMITH As we did not find an efficient parallelization of DIAGTOSMITH and as the original procedure is very fast in practice, we decided to parallelize it trivially. More exactly each diagonal element is broadcast from the process it lies on 10

11 to all other processes, so that the operations of the original algorithm can be performed successively. Although this parallelization does not save time, in general it is necessary, because for large matrices the memory of one process might not be large enough for the complete transformation matrices. INPUT Cardinality of processes q, number of its own process z, A R krow(z),n with P q 1 z=0 krow(z) = m, whole matrix Ā Rm,n in diagonal form U = [u 1,..., u m] T R m,krow(z), V = [v 1,..., v n] R kcol(z),n with P q 1 z=0 k col(z) = n 1 FOR k = 1,..., min{m, n} 1 2 FOR l = min{m, n} 1,..., k 3 y 1 = ROW-TASK (l) 4 y 2 = ROW-TASK (l + 1) 5 IF y 1 = z 6 THEN h 1 = ROW-NUM (l) 7 g 1 = A h1,l 8 BROADCAST number g 1 9 ELSE BROADCAST-RECEIVE number g 1 10 IF y 2 = z 11 THEN h 2 = ROW-NUM (l + 1) 12 g 2 = A h2,l+1 13 BROADCAST number g 2 14 ELSE BROADCAST-RECEIVE number g 2 15 Compute d := gcd (g 1, g 2) and u, v with d = u g 1 + v g 2 16 IF y 1 = z 17 THEN IF g 1 g 2 18 THEN A h1,l = gcd (g 1, g 2) 19 IF y 2 = z 20 THEN IF g 1 g 2 21 THEN A h2,l+1 = g 1 g 2 gcd (g 1,g 2 ) 22 IF g 1 g 2 «u v 23 THEN [u l, u l+1 ] T = g 2 g 1d [u l, u l+1 ] T d 1 v g 2d «24 [v l, v l+1 ] = [v l, v l+1 ] 1 u g1 d 25 FOR l = 1,..., min{m, n} 26 y = ROW-TASK (l) 27 IF y = z 28 THEN h = ROW-NUM (l) 29 IF A h,l 0 30 THEN A h,l = β(a h,l ) A h,l 31 v l = β(a h,l ) v l OUTPUT (A, U, V ) = PAR-DIAGTOSMITH (A, U, V ) Correctness: The elements Āl,l and Āl+1,l+1 of the whole matrix lie on the processes y 1 and y 2, where they are the elements A h1,l and A h2,l+1, respectively. A h1,l is broadcast from y 1 and A h2,l+1 from y 2, so that the elements Āl,l and Ā l+1,l+1 are known on all processes. The new A h1,l is computed on process y 1 and the new A h2,l+1 on process y 2. The computation of the transformation 11

12 matrices comes from (1). In this algorithm O(p 2 ) BROADCAST operations are performed and O(p 2 ) ring elements are sent with BROADCAST. We use two versions of this algorithm, one for row distribution and one for column distribution Auxiliary algorithms PAR-ROWTOCOL and PAR-COLTOROW For SNF algorithms, we have to change between a row distributed matrix and a column distributed matrix and vice versa. We call these algorithms PAR- ROWTOCOL and PAR-COLTOROW, respectively. In fact, both procedures are parallel matrix transpositions. As in the following SNF algorithms matrix transpositions are only called a few times (for the Kannan-Bachem SNF Algorithm 4.5 in average not more than 4 to 5 times, for the Hartley-Hawkes SNF Algorithm 4.6 in average not much more than min{m, n} times), we used the following simple implementation (see Fig. 2(a) and Fig. 2(b)). Let A R krow(z),n with q 1 z=0 k row(z) = m be row distributed. A is transformed into a matrix B R m,kcol(z) with q 1 z=0 k col(z) = n. Every process communicates with every different process. For a process x the SEND operation has the form SEND {A s,t 1 s k row (x), 1 t n} TO COL-TASK (t). For a process x the SEND-RECEIVE operation has the form SEND-RECEIVE {B s,t 1 s m, 1 t k col (x)} FROM ROW-TASK (s). This algorithm does not need to be applied to the corresponding transformation matrices, as the left hand transformation matrix U is always transformed by row operations, i.e., is column distributed, and the right hand transformation matrix V is always transformed by column operations, i.e., is row distributed. We obtain the algorithm PAR-COLTOROW from the algorithm PAR-ROW- TOCOL by exchanging the role of rows and columns Parallel Kannan-Bachem SNF algorithm INPUT Cardinality of processes q, number of its own process z, A R krow(z),n with P q 1 z=0 krow(z) = m, whole matrix Ā Rm,n, U = [u 1,..., u m] T R m,krow(z), V = [v 1,..., v n] R k col(z),n with P q 1 z=0 k col(z) = n 1 WHILE (Ā is not in diagonal form) 2 (A, V ) = PAR-HNF (A, V ) 3 B = PAR-ROWTOCOL (A) B R m,k col(z) 4 (B, U) = PAR-LHNF (B, U) 5 A = PAR-COLTOROW (B) 6 (A, U, V ) = PAR-DIAGTOSMITH (A, U, V ) OUTPUT (A, U, V ) = PAR-KB-SNF (A, U, V ) Theorem 4.2. [20] Let Ā R m,n with p = max{m, n}, and R = Z and R = F[x], respectively. In Algorithm 4.5 O(p 4 log 2 (p Ā )) and O(p 4 Ā deg) BROADCAST operations are performed, respectively, and O(p 5 log 2 (p Ā )) and O(p 5 Ā deg) ring elements are sent with BROADCAST, respectively. Further O(q 2 p 2 log 2 (p Ā )) and O(q 2 p 2 Ā deg) SEND operations are performed, 12

13 respectively, and O(p 4 log 2 (p Ā )) and O(p 4 Ā deg) ring elements are sent with SEND, respectively Parallel Hartley-Hawkes SNF algorithm INPUT Cardinality of processes q, number of its own process z, A = [a 1,..., a krow(z)] T R krow(z),n with P q 1 z=0 krow(z) = m U = [u 1,..., u m] T R m,krow(z), V = [v 1,..., v n] R k col(z),n with P q 1 z=0 k col(z) = n 1 l = 1 2 WHILE l min{m, n} 3 y = ROW-TASK (l) 4 h = ROW-NUM (l) 5 IF y = z 6 THEN BROADCAST vector a h 7 ELSE BROADCAST-RECEIVE vector v FROM y 8 Insert v as h-th row vector of A 9 IF NOT (A h,l+1:n ) = 0 10 THEN A = ROWGCD (A, V, h, l) 11 IF NOT y = z 12 THEN Remove the h-th row vector of A 13 B = PAR-ROWTOCOL (A) B R m,k col(z) 14 y = COL-TASK (l) 15 h = COL-NUM (l) 16 IF y = z 17 THEN BROADCAST vector b h (the h-th column of B) 18 ELSE BROADCAST-RECEIVE vector g FROM y 19 Insert g as h-th column vector of B 20 IF (B l+1:m,h ) = 0 21 THEN l = l ELSE B = COLGCD (B, U, l, h) 23 IF NOT y = z 24 THEN Remove the h-th column vector of B 25 A = PAR-COLTOROW (B) 26 (A, U, V ) = PAR-DIAGTOSMITH (A, U, V ) OUTPUT (A, U, V ) = PAR-HH-SNF (A, U, V ) For the proof of the correctness we refer to [20]. Theorem 4.3. [20] Let Ā R m,n with p = max{m, n}, and R = Z and R = F[x], respectively. In Algorithm 4.6 O(p 2 log 2 (p Ā )) and O(p 2 Ā deg) BROADCAST operations are performed, respectively, and O(p 3 log 2 (p Ā )) and O(p 3 Ā deg) ring elements are sent with BROADCAST, respectively. Further O(q 2 p 2 log 2 (p Ā )) and O(q 2 p 2 Ā deg) SEND operations are performed, respectively, and O(p 4 log 2 (p Ā )) and O(p 4 Ā deg) ring elements are sent with SEND, respectively. In comparison to the PAR-KB-SNF Algorithm the complexity of the BROAD- CAST operations is improved by a factor of p 2, whereas the complexity of the SEND operations remains unchanged. 13

14 5. Experiments with the parallel versions of the normal form algorithms The original algorithms of this paper were implemented in the language C++ with the compiler g++, version and the parallel programs with mpicc, version The sequential and parallel experiments were made on 32 nodes with 2 Intel Xeon 2.4 GHz machines with 1 GB for the first 32 processors and 0.5 GB for the last 32 processors. For the parallel experiments we use up to 64 processors, where 2 processors belong to a node. Every process of our parallel program runs on one of these processors. Additionally, we compare our results with the program package MAGMA (abbreviated by MM ) V under Linux on a GenuineIntel Intel(R) Pentium(R) 4 CPU 3.00 GHz processor with main memory 1 GB and with MAPLE (abbreviated by MP ) 6 under SunOS on 4 sparcv9 floating point processors with 1281 MHz and main memory 16 GB. We do experiments with matrices over the rings Q[x], F 3 [x], and F 5 [x] (for the results for the rings Z and F 2 [x] we refer to [20]). Note that MAPLE only works for the ring Q[x]. The execution times are given in the form hh:mm:ss (hours:minutes:seconds). The tests with MAGMA and MAPLE were stopped after 24 hours. If a special time is not listed, it means that the corresponding algorithm needed more than 24 hours. As first test class we used the matrices B n = (b s,t = b s,t x E n ) for 1 s, t n, where b s,t are randomly chosen from [ 99; 100] for Q[x], from {0; 1; 2} for F 3 [x], and from {0; 1; 2; 3; 4} for F 5 [x]. These are characteristic matrices with full rank. For the rings F 3 [x] and F 5 [x] we also used a second test class, namely the matrices C q n = (c s,t = p t 1 s 1 mod q) for 1 s, t n, where (p s (x)) s 0 = (0, 1, 2, x, x+1, x+2, x 2, x 2 +1,... ) for the ring F 3 [x] and (p s (x)) s 0 = (0, 1, 2, 3, 4, x, x+1, x+2, x+3, x+4, x 2, x 2 + 1,... ) for the ring F 5 [x] and where q is one of the irreducible polynomials q 1 (x) = x 5 + x 4 + x + 2, q 2 (x) = 2x 4 + 3x Efficiency An important criterion for the quality of a parallel algorithm is the efficiency E, depending on the cardinality of the processes q, which we define as E(q) = T s q T q, where T q is the execution time for the parallel algorithm with q processes and T s the execution time for the corresponding sequential (original) algorithm. The efficiency is the percentage of the total parallel execution time devoted to the computation in comparison to that devoted to the communication. In general, if the efficiency is large, the quality of the parallel program is quite good, but it is also rather important that the efficiency remains constant (or nearly constant) for a larger cardinality of processes. The efficiency is at most 1. We computed the normal forms with corresponding transformation matrices of a matrix of medium size on 1, 2, 4, 8, 16, 32 and 64 processes with the PAR- HNF algorithm, the PAR-KB-SNF Algorithm and PAR-HH-SNF Algorithm. We also used the corresponding original algorithms. For every parallel normal form computation and for each cardinality of processes 1, 2, 4, 8, 16, 32 and 64, we give the efficiency in % and show also the behaviour graphically. The 14

15 results of the PAR-HNF Algorithm, the PAR-KB-SNF Algorithm, PAR-HH- SNF Algorithm and of MAGMA and MAPLE for the full-rank matrix B 16 of the ring Q[x], the full-rank matrix C q1 240 of the ring F 3[x], and the full-rank matrix C q2 200 of the ring F 5[x] can be found in Table/Fig. 3. Note that MAGMA and MAPLE supply only one SNF version (we do not know, whether KB or HH or another version is implemented). Results for the ring Q[x]: Q[x] is a very difficult ring, as the entries can explode in two ways: the degrees of the polynomials and the numerators/denominators of the coefficients. Therefore it is only possible to test such a small matrix as B 16. Obviously it makes no sense to use more processes than the row or column number of the matrix (in this case 16). The PAR-KB-SNF Algorithm and the PAR-HNF Algorithm are a little bit faster than the PAR- HH-SNF Algorithm, but the efficiency is nearly the same. The efficiency is smaller for a large cardinality of processes. MAGMA and MAPLE are faster for the HNF, but not competitive for the SNF. Results for the ring F 3 [x]: The PAR-HH-SNF Algorithm is faster than the PAR-HNF Algorithm and the PAR-KB-SNF Algorithm. The efficiency of the PAR-HNF Algorithm is the best one, and the efficiency of the PAR-KB-SNF Algorithm is larger than the PAR-HH-SNF Algorithm for a small cardinality of processes and smaller for a large cardinality of processes. MAGMA is faster than 4 processes of the PAR-HNF Algorithm and not competitive for the SNF. Results for the ring F 5 [x]: The results are similar as for the ring F 3 [x]. The PAR-HH-SNF Algorithm is much faster than the PAR-HNF Algorithm and the PAR-KB-SNF Algorithm, but considering the efficiency the PAR-HNF Algorithm is the best one, and the PAR-KB-SNF Algorithm is better than the PAR-HH-SNF Algorithm. Again, MAGMA is faster than 4 processes of the PAR-HNF Algorithm and not competitive for the SNF Large example matrices For all rings we want to find out the maximum number of rows and columns of an input matrix we are able to compute the HNF/SNF for. The results of the PAR-HNF Algorithm, the algorithms PAR-KB-SNF, PAR-HH-SNF and of MAGMA and MAPLE for the largest possible example matrices can be found in Table 1, where one process per each row/column is used for Q[x] and 64 processes are used for F 3 [x] and F 5 [x]. Note that MAGMA and MAPLE supply only one SNF version. Results for the ring Q[x]: We succeeded in computing the normal forms of B 26, B 28, B 30, and B 32, where the maximum memory (80 MB) was used by the PAR-HH-SNF Algorithm for B 32. As for B 16 from section 5.1, MAGMA and MAPLE are faster for the HNF, but not competitive for the SNF. Results for the ring F 3 [x]: We succeeded in computing the normal forms of the full-rank matrices B 700, B 800 and the matrices C q1 340 Cq1 360 of rank 243, where the maximum memory (397 MB) was used by the PAR-HH-SNF Algorithm for C q1 360, i.e., the available memory is nearly exhausted, and thus matrices with much larger row/column numbers cannot be computed. 15

16 Q[x]: B 16 PAR-HNF PAR-KB-SNF PAR-HH-SNF Processes Time Eff. Time Eff. Time Eff. 1 (orig.) 00:02:31 00:02:31 00:03:10 1 (par.) 00:02:33 99 % 00:02:34 98 % 00:03:14 98 % 2 00:01:24 90 % 00:01:24 90 % 00:01:46 90 % 4 00:00:49 77 % 00:00:49 77 % 00:01:02 77 % 8 00:00:32 59 % 00:00:33 57 % 00:00:41 58 % 16 00:00:24 39 % 00:00:24 39 % 00:00:30 40 % MAGMA 00:00:06 MAPLE 00:00:21 01:25:54 01:25:54 F 3[x]: C q PAR-HNF PAR-KB-SNF PAR-HH-SNF Processes Time Eff. Time Eff. Time Eff. 1 (orig.) 03:03:49 03:03:56 02:06:21 1 (par.) 03:07:49 98 % 03:08:54 97 % 02:11:26 96 % 2 01:36:28 95 % 01:37:44 94 % 01:07:55 93 % 4 00:48:44 94 % 00:50:00 92 % 00:34:46 91 % 8 00:25:33 90 % 00:26:50 86 % 00:18:16 86 % 16 00:13:44 84 % 00:15:03 76 % 00:09:54 80 % 32 00:07:56 72 % 00:09:20 62 % 00:05:52 67 % 64 00:04:57 58 % 00:06:32 44 % 00:04:01 49 % MAGMA 00:32:45 F 5[x]: C q PAR-HNF PAR-KB-SNF PAR-HH-SNF Processes Time Eff. Time Eff. Time Eff. 1 (orig.) 02:57:48 02:58:13 00:31:48 1 (par.) 03:06:26 95 % 03:07:20 95 % 00:35:09 90 % 2 01:36:32 92 % 01:37:39 91 % 00:18:30 86 % 4 00:50:55 87 % 00:51:55 86 % 00:09:32 83 % 8 00:26:32 84 % 00:27:31 81 % 00:05:03 79 % 16 00:14:20 78 % 00:15:19 73 % 00:02:48 71 % 32 00:08:26 66 % 00:09:27 59 % 00:01:46 56 % 64 00:05:26 51 % 00:06:39 42 % 00:01:15 40 % MAGMA 00:32:56 08:26:18 08:26: HNF KB-SNF HH-SNF HNF KB-SNF HH-SNF HNF KB-SNF HH-SNF Efficiency in % Efficiency in % Efficiency in % Cardinality of processes Cardinality of processes Cardinality of processes (a) Q[x]: B 16 (b) F 3 [x]: C q (c) F 5 [x]: C q Figure 3: Execution time and efficiency for a special instance for the rings Q[x], F 3 [x], F 5 [x] 16

17 Q[x] PAR-HNF MM-HNF MP-HNF KB-SNF PAR-HH-SNF B 26 00:23:58 00:03:20 00:05:44 00:23:59 00:46:30 B 28 00:45:26 00:05:23 00:09:15 00:45:28 01:34:28 B 30 01:22:59 00:08:46 00:15:35 01:23:05 03:03:48 B 32 02:28:14 00:14:08 00:24:41 02:28:17 05:51:45 F 3[x] PAR-HNF MM-HNF PAR-KB-SNF PAR-HH-SNF B :21:21 03:39:27 00:23:43 B :30:10 06:47:33 00:43:25 C q 1 C q :06:45 00:33:45 00:12:09 00:08: :06:56 00:37:53 00:13:37 00:10:18 F 5[x] PAR-HNF MM-HNF PAR-KB-SNF PAR-HH-SNF B :15:37 03:24:04 00:16:17 B :29:01 04:40:12 00:21:54 C q 2 C q :41:13 05:03:19 00:45:52 00:08: :57:01 05:55:02 01:02:50 00:11:15 Table 1: Execution time for large example matrices of the rings Q[x], F 3 [x], F 5 [x] Results for the ring F 5 [x]: We succeeded in computing the normal forms of the full-rank matrices B 650, B 700 and the full-rank matrices C q2 320 Cq2 340, i.e., in comparison to the ring F 3 [x] matrices with a little bit smaller dimensions can be computed. Here the maximum memory (240 MB) was used by the PAR- KB-SNF Algorithm for C q In most cases, matrices with larger row/column numbers could not be computed by at least one algorithm. For example, the SNF of the matrix B 800 could not be computed by the PAR-HH-SNF Algorithm Data distribution Another important criterion for the quality of a parallel program is the data distribution, i.e., whether the data are nearly equally distributed to all processes. Especially in our case, when the data might be too large for one process, this criterion is essential for the algorithms. One hint, that the data distribution does not lead to load imbalances, is the quite good efficiency shown in the experiments of section 5.1, as by a bad data distribution one process would probably receive more work than another one leading to a bad efficiency. To show the data distribution, we consider the combination algorithm/matrix from section 5.1 with the largest memory requirement for one process, which is the PAR-HH-SNF Algorithm for the ring F 3 [x], applied to the matrix C q In Fig. 4(a) we graphically show the maximum used memory in MB for 1, 2, 4, 8, 16, 32, and 64 processes. We observe that the used memory is significantly reduced by each step from each cardinality of processes to the double of this cardinality. Overall the used memory is reduced from 919 MB for 1 process to 118 MB for 64 processes, i.e., we receive nearly a reduction factor of 8. In section 4.1 we have suggested a rather natural data distribution. As shown in section 5.1, this distribution leads to a good efficiency, and shown by the previous example, the memory reduction for increasing cardinality of 17

18 Maximum used memory in MB HH-SNF Used memory in MB Distribution 1 Distribution 2 Distribution Cardinality of processes Process number (a) Maximum used memory of the PAR-HH- (b) Used memory of each process of the PAR- SNF Algorithm for the ring F 3 [x], applied to HNF Algorithm with 64 processes for the ring the matrix C q Z, applied to the matrix A 991 Figure 4: Memory requirements processes is also good. But it is still not clear, whether our data distribution can essentially be improved or not. As already mentioned in section 4.1, the data distribution suggested in [29] is a possible alternate choice. To analyse this effect, we additionally apply three versions of the PAR-HNF Algorithm for the ring Z to the full-rank matrix A 991. The three versions only differ in the used row distribution. The first distribution is our original distribution, the second one is that of [29]. As third one we test the following even more natural distribution: If the cardinality of processes q is a divisor of m, the process with the number z receives the rows z (m/q) + 1, z (m/q) + 2,..., z (m/q) + m/q for z = 0,..., q 1. If q is no divisor of m with m = q s + t and 0 < t < q, the first t processes receive an additional row (which also holds for our original distribution). Fig. 4(b) shows the used memory for each of the three distributions and for each of the 64 processes. For distribution 1 we observe a nearly constant memory with only one strong decrease at process 30. This decrease comes from the equality 31 = 991 mod 64, as up to process 30 the processes receive an additional row. Distribution 2 also has a nearly constant memory, but the memory values are permuted in comparison to distribution 1. Distribution 3 is the worst one, as it has small memory for the first processes and very large memory for the last processes. The reason for this bad behaviour is that the coefficient explosion during the PAR-HNF Algorithm is stronger for rows with larger numbers. For example, the last process receives all rows with the largest numbers and with the strongest coefficient explosion. Then this process has to do the most work and needs the largest memory. The effects of the three distributions can be seen on the execution times measured by the Intel Xeon 2.4 GHz machines. The PAR-HNF Algorithm based on distribution 1 leads to an execution time of 01:56:17, distribution 2 to 01:55:37, where (as shown by further experiments) the difference comes only from execution time inaccuracies. In comparison, distribution 3 leads to a much worse execution time 05:14:48, i.e., this distribution would lead to a much worse efficiency. 18

19 Acknowledgement We would like to thank Volker Gebhardt for some helpful discussions, the University of Halle-Wittenberg for the platform for the parallel experiments and the anonymous referees for their comments, which helped us to improve the paper. The research of both authors was supported by DFG (Germany). References [1] W.A. Blankinship, Algorithm 287: Matrix Triangulation with Integer Arithmetic [F1], Comm. ACM 9(7) (1966) 513. [2] G.H. Bradley, Algorithms for Hermite and Smith Normal Matrices and Linear Diophantine Equations, Math. Comp. 25(116) (1971) [3] R.P. Brent, Parallel Algorithms in Linear Algebra, Algorithms and Architectures: Proc. Second NEC Research Symposium, 1993, pp [4] S. Chatterjee, S. Sen, Cache-Efficient Matrix Transposition, in: Proc. 6th International Symposium on High-Performance Computer Architecture (IEEE Computer Society), 2000, pp [5] J. Choi, J. Dongarra, D.W. Walker, Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers, Parallel Comput. 21(9) (1995) [6] T.W.J. Chou, G.E. Collins, Algorithms for the Solution of Systems of Linear Diophantine Equations, SIAM J. Comput. 11(4) (1982) [7] J. Dongarra et al., Sourcebook of Parallel Computing, Morgan Kaufmann Publishers, Inc., San Francisco, 2003 [8] X.G. Fang, G. Havas, On the Worst-Case Complexity of Integer Gaussian Elimination, in: Proc. International Symposium on Symbolic and Algebraic Computation, ACM Press, 1997, pp [9] A. Fujii, R. Suda, A. Nishida, Parallel Matrix Distribution Library for Sparse Matrix Solvers, in: Proc. 8th International Conference on High-Performance Computing in Asia-Pacific Region, IEEE Computer Society, 2005, pp [10] M. Giesbrecht, Fast Computation of the Smith Normal Form of an Integer Matrix, in: Proc. International Symposium on Symbolic and Algebraic Computation, ACM Press, 1995, pp [11] J.L. Hafner, K.S. McCurley, Asymptotically Fast Triangularization of Matrices over Rings, SIAM J. Comput. 20(6) (1991) [12] B. Hartley, T.O. Hawkes, Rings, Modules and Linear Algebra, Chapman and Hall, London, [13] G. Havas, D.F. Holt, S. Rees, Recognizing Badly Presented Z-Modules, Linear Algebra Appl. 192 (1993) [14] G. Havas, B.S. Majewski, Hermite Normal Form Computation for Integer Matrices, Congr. Numer. 105 (1994) [15] G. Havas, B.S. Majewski, Integer Matrix Diagonalization, J. Symbolic Comput. 24(3/4) (1997) [16] G. Havas, L.S. Sterling, Integer Matrices and Abelian Groups, in: Proc. International Symposium on Symbolic and Algebraic Manipulation, Lecture Notes in Comput. Sci., 72, Springer, New York, 1979, pp

20 [17] G. Havas, C. Wagner, Matrix Reduction Algorithms for Euclidean Rings, in: Proc. Asian Symposium on Computer Mathematics, Lanzhou University Press, 1998, pp [18] D. Heller, A Survey of Parallel Algorithms in Numerical Linear Algebra, SIAM Review 20(4) (1978) [19] C. Hermite, Sur l introduction des variables continues dans la théorie des nombres, J. Reine Angew. Math., 41 (1851) [20] G. Jäger, Parallel Algorithms for Computing the Smith Normal Form of Large Matrices, in: Proc. 10th European PVM/MPI, Lecture Notes in Comput. Sci. 2840, Springer, Berlin- Heidelberg, 2003, pp [21] E. Kaltofen, M.S. Krishnamoorthy, B.D. Saunders, Fast Parallel Computation of Hermite and Smith Forms of Polynomial Matrices, SIAM J. Algebraic and Discrete Methods 8(4) (1987) [22] E. Kaltofen, M.S. Krishnamoorthy, B.D. Saunders, Parallel Algorithms for Matrix Normal Forms, Linear Algebra Appl. 136 (1990) [23] M. Kaminski, A. Paz, Computing the Hermite Normal Form of an Integral Matrix, Tech. Rep. Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel, June [24] R. Kannan, Polynomial-Time Algorithms for Solving Systems of Linear Equations over Polynomials, Theoret. Comput. Sci. 39 (1985) [25] R. Kannan, A. Bachem, Polynomial Algorithms for Computing the Smith and Hermite Normal Forms of an Integer Matrix, SIAM J. Comput. 8(4) (1979) [26] H.-J. Lee, J.A.B. Fortes, Toward data distribution independent parallel matrix multiplication, in: Proc. 9th International Parallel Processing Symposium IEEE Computer Society, 1995, pp [27] F.T. Leighton, Introduction to Parallel Algorithms and Architectures, Morgan Kaufmann Publishers, Inc., San Francisco, [28] G.O. Michler, R. Staszewski, Diagonalizing Characteristic Matrices on Parallel Machines, Preprint 27, Institut für Experimentelle Mathematik, Universität/GH Essen, [29] M. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw Hill, Columbus, [30] C.C. Sims, Computation with Finitely Presented Groups, Cambridge University Press, [31] H.J.S Smith, On Systems of Linear Indeterminate Equations and Congruences, Philos. Trans. R. Soc. Lond. 151 (1861) [32] A. Storjohann, Near Optimal Algorithms for Computing Smith Normal Forms of Integer Matrices, in: Proc. International Symposium on Symbolic and Algebraic Computation, ACM Press, 1996, pp [33] A. Storjohann, Computing Hermite and Smith Normal Forms of Triangular Integer Matrices, Linear Algebra Appl. 282(1-3) (1998) [34] A. Storjohann, G. Labahn, A Fast Las Vegas Algorithm for Computing the Smith Normal Form of a Polynomial Matrix, Linear Algebra Appl. 253(1) (1997) [35] J. Suh, V.K. Prasanna, An Efficient Algorithm for Out-of-Core Matrix Transposition, IEEE Trans. Computers 51(4) (2002) [36] H.A. van der Vorst, P. van Doreen (Eds.), Parallel Algorithms for Numerical Linear Algebra, Advances in Parallel Computing 1, North-Holland, [37] G. Villard, Fast Parallel Computation of the Smith Normal Form of Polynomial Matrices, in: Proc. International Symposium on Symbolic and Algebraic Computation, ACM Press, 1994, pp

Parallel Algorithms for Computing the Smith Normal Form of Large Matrices

Parallel Algorithms for Computing the Smith Normal Form of Large Matrices Parallel Algorithms for Computing the Smith Normal Form of Large Matrices Gerold Jäger Mathematisches Seminar der Christian-Albrechts-Universität zu Kiel, Christian-Albrechts-Platz 4, D-24118 Kiel, Germany

More information

Reduction of Smith Normal Form Transformation Matrices

Reduction of Smith Normal Form Transformation Matrices Reduction of Smith Normal Form Transformation Matrices G. Jäger, Kiel Abstract Smith normal form computations are important in group theory, module theory and number theory. We consider the transformation

More information

Primitive sets in a lattice

Primitive sets in a lattice Primitive sets in a lattice Spyros. S. Magliveras Department of Mathematical Sciences Florida Atlantic University Boca Raton, FL 33431, U.S.A spyros@fau.unl.edu Tran van Trung Institute for Experimental

More information

Black Box Linear Algebra

Black Box Linear Algebra Black Box Linear Algebra An Introduction to Wiedemann s Approach William J. Turner Department of Mathematics & Computer Science Wabash College Symbolic Computation Sometimes called Computer Algebra Symbols

More information

A Fast Las Vegas Algorithm for Computing the Smith Normal Form of a Polynomial Matrix

A Fast Las Vegas Algorithm for Computing the Smith Normal Form of a Polynomial Matrix A Fast Las Vegas Algorithm for Computing the Smith Normal Form of a Polynomial Matrix Arne Storjohann and George Labahn Department of Computer Science University of Waterloo Waterloo, Ontario, Canada N2L

More information

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x Technical Report CS-93-08 Department of Computer Systems Faculty of Mathematics and Computer Science University of Amsterdam Stability of Gauss-Huard Elimination for Solving Linear Systems T. J. Dekker

More information

Black Box Linear Algebra with the LinBox Library. William J. Turner North Carolina State University wjturner

Black Box Linear Algebra with the LinBox Library. William J. Turner North Carolina State University   wjturner Black Box Linear Algebra with the LinBox Library William J. Turner North Carolina State University www.math.ncsu.edu/ wjturner Overview 1. Black Box Linear Algebra 2. Project LinBox 3. LinBox Objects 4.

More information

Solving Sparse Rational Linear Systems. Pascal Giorgi. University of Waterloo (Canada) / University of Perpignan (France) joint work with

Solving Sparse Rational Linear Systems. Pascal Giorgi. University of Waterloo (Canada) / University of Perpignan (France) joint work with Solving Sparse Rational Linear Systems Pascal Giorgi University of Waterloo (Canada) / University of Perpignan (France) joint work with A. Storjohann, M. Giesbrecht (University of Waterloo), W. Eberly

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

Black Box Linear Algebra with the LinBox Library. William J. Turner North Carolina State University wjturner

Black Box Linear Algebra with the LinBox Library. William J. Turner North Carolina State University   wjturner Black Box Linear Algebra with the LinBox Library William J. Turner North Carolina State University www.math.ncsu.edu/ wjturner Overview 1. Black Box Linear Algebra 2. Project LinBox 3. LinBox Objects 4.

More information

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given

More information

Diophantine equations via weighted LLL algorithm

Diophantine equations via weighted LLL algorithm Cryptanalysis of a public key cryptosystem based on Diophantine equations via weighted LLL algorithm Momonari Kudo Graduate School of Mathematics, Kyushu University, JAPAN Kyushu University Number Theory

More information

Lattice reduction of polynomial matrices

Lattice reduction of polynomial matrices Lattice reduction of polynomial matrices Arne Storjohann David R. Cheriton School of Computer Science University of Waterloo Presented at the SIAM conference on Applied Algebraic Geometry at the Colorado

More information

Extended gcd and Hermite normal form algorithms via lattice basis reduction

Extended gcd and Hermite normal form algorithms via lattice basis reduction Extended gcd and Hermite normal form algorithms via lattice basis reduction George Havas School of Information Technology The University of Queensland Queensland 4072, Australia URL: http://www.it.uq.edu.au/personal/havas/

More information

ACI-matrices all of whose completions have the same rank

ACI-matrices all of whose completions have the same rank ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices

More information

Algorithms for exact (dense) linear algebra

Algorithms for exact (dense) linear algebra Algorithms for exact (dense) linear algebra Gilles Villard CNRS, Laboratoire LIP ENS Lyon Montagnac-Montpezat, June 3, 2005 Introduction Problem: Study of complexity estimates for basic problems in exact

More information

Math 121 Homework 5: Notes on Selected Problems

Math 121 Homework 5: Notes on Selected Problems Math 121 Homework 5: Notes on Selected Problems 12.1.2. Let M be a module over the integral domain R. (a) Assume that M has rank n and that x 1,..., x n is any maximal set of linearly independent elements

More information

Polynomial functions over nite commutative rings

Polynomial functions over nite commutative rings Polynomial functions over nite commutative rings Balázs Bulyovszky a, Gábor Horváth a, a Institute of Mathematics, University of Debrecen, Pf. 400, Debrecen, 4002, Hungary Abstract We prove a necessary

More information

Hermite normal form: Computation and applications

Hermite normal form: Computation and applications Integer Points in Polyhedra Gennady Shmonin Hermite normal form: Computation and applications February 24, 2009 1 Uniqueness of Hermite normal form In the last lecture, we showed that if B is a rational

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

Sparse Polynomial Multiplication and Division in Maple 14

Sparse Polynomial Multiplication and Division in Maple 14 Sparse Polynomial Multiplication and Division in Maple 4 Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University Burnaby B.C. V5A S6, Canada October 5, 9 Abstract We report

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Matrices. Chapter Definitions and Notations

Matrices. Chapter Definitions and Notations Chapter 3 Matrices 3. Definitions and Notations Matrices are yet another mathematical object. Learning about matrices means learning what they are, how they are represented, the types of operations which

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Computing Minimal Polynomial of Matrices over Algebraic Extension Fields

Computing Minimal Polynomial of Matrices over Algebraic Extension Fields Bull. Math. Soc. Sci. Math. Roumanie Tome 56(104) No. 2, 2013, 217 228 Computing Minimal Polynomial of Matrices over Algebraic Extension Fields by Amir Hashemi and Benyamin M.-Alizadeh Abstract In this

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

Matrix decompositions

Matrix decompositions Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers

More information

Lecture 7: Polynomial rings

Lecture 7: Polynomial rings Lecture 7: Polynomial rings Rajat Mittal IIT Kanpur You have seen polynomials many a times till now. The purpose of this lecture is to give a formal treatment to constructing polynomials and the rules

More information

Matrix decompositions

Matrix decompositions Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers

More information

x y B =. v u Note that the determinant of B is xu + yv = 1. Thus B is invertible, with inverse u y v x On the other hand, d BA = va + ub 2

x y B =. v u Note that the determinant of B is xu + yv = 1. Thus B is invertible, with inverse u y v x On the other hand, d BA = va + ub 2 5. Finitely Generated Modules over a PID We want to give a complete classification of finitely generated modules over a PID. ecall that a finitely generated module is a quotient of n, a free module. Let

More information

Fraction-free Row Reduction of Matrices of Skew Polynomials

Fraction-free Row Reduction of Matrices of Skew Polynomials Fraction-free Row Reduction of Matrices of Skew Polynomials Bernhard Beckermann Laboratoire d Analyse Numérique et d Optimisation Université des Sciences et Technologies de Lille France bbecker@ano.univ-lille1.fr

More information

Fast Computation of Hermite Normal Forms of Random Integer Matrices

Fast Computation of Hermite Normal Forms of Random Integer Matrices Fast Computation of Hermite Normal Forms of Random Integer Matrices Clément Pernet 1 William Stein 2 Abstract This paper is about how to compute the Hermite normal form of a random integer matrix in practice.

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio

CSE 206A: Lattice Algorithms and Applications Spring Basic Algorithms. Instructor: Daniele Micciancio CSE 206A: Lattice Algorithms and Applications Spring 2014 Basic Algorithms Instructor: Daniele Micciancio UCSD CSE We have already seen an algorithm to compute the Gram-Schmidt orthogonalization of a lattice

More information

arxiv: v1 [cs.sc] 17 Apr 2013

arxiv: v1 [cs.sc] 17 Apr 2013 EFFICIENT CALCULATION OF DETERMINANTS OF SYMBOLIC MATRICES WITH MANY VARIABLES TANYA KHOVANOVA 1 AND ZIV SCULLY 2 arxiv:1304.4691v1 [cs.sc] 17 Apr 2013 Abstract. Efficient matrix determinant calculations

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,

More information

Computing abelian subalgebras for linear algebras of upper-triangular matrices from an algorithmic perspective

Computing abelian subalgebras for linear algebras of upper-triangular matrices from an algorithmic perspective DOI: 10.1515/auom-2016-0032 An. Şt. Univ. Ovidius Constanţa Vol. 24(2),2016, 137 147 Computing abelian subalgebras for linear algebras of upper-triangular matrices from an algorithmic perspective Manuel

More information

Introduction to finite fields

Introduction to finite fields Chapter 7 Introduction to finite fields This chapter provides an introduction to several kinds of abstract algebraic structures, particularly groups, fields, and polynomials. Our primary interest is in

More information

Towards parallel bipartite matching algorithms

Towards parallel bipartite matching algorithms Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,

More information

Hermite Normal Forms and its Cryptographic Applications

Hermite Normal Forms and its Cryptographic Applications Hermite Normal Forms and its Cryptographic Applications A thesis submitted in fulfillment of the requirements for the award of the degree Master of Computer Science from UNIVERSITY OF WOLLONGONG by Vasilios

More information

COMPUTER ARITHMETIC. 13/05/2010 cryptography - math background pp. 1 / 162

COMPUTER ARITHMETIC. 13/05/2010 cryptography - math background pp. 1 / 162 COMPUTER ARITHMETIC 13/05/2010 cryptography - math background pp. 1 / 162 RECALL OF COMPUTER ARITHMETIC computers implement some types of arithmetic for instance, addition, subtratction, multiplication

More information

A New Algorithm and Refined Bounds for Extended Gcd Computation

A New Algorithm and Refined Bounds for Extended Gcd Computation A New Algorithm and Refined Bounds for Extended Gcd Computation David Ford* and George Havas** Department of Computer Science, Concordia University, Montrfial, Qufibec, Canada H3G 1M8 and Department of

More information

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in Vectors and Matrices Continued Remember that our goal is to write a system of algebraic equations as a matrix equation. Suppose we have the n linear algebraic equations a x + a 2 x 2 + a n x n = b a 2

More information

IN THE international academic circles MATLAB is accepted

IN THE international academic circles MATLAB is accepted Proceedings of the 214 Federated Conference on Computer Science and Information Systems pp 561 568 DOI: 115439/214F315 ACSIS, Vol 2 The WZ factorization in MATLAB Beata Bylina, Jarosław Bylina Marie Curie-Skłodowska

More information

Fast computation of normal forms of polynomial matrices

Fast computation of normal forms of polynomial matrices 1/25 Fast computation of normal forms of polynomial matrices Vincent Neiger Inria AriC, École Normale Supérieure de Lyon, France University of Waterloo, Ontario, Canada Partially supported by the mobility

More information

SOLUTION of linear systems of equations of the form:

SOLUTION of linear systems of equations of the form: Proceedings of the Federated Conference on Computer Science and Information Systems pp. Mixed precision iterative refinement techniques for the WZ factorization Beata Bylina Jarosław Bylina Institute of

More information

On the Berlekamp/Massey Algorithm and Counting Singular Hankel Matrices over a Finite Field

On the Berlekamp/Massey Algorithm and Counting Singular Hankel Matrices over a Finite Field On the Berlekamp/Massey Algorithm and Counting Singular Hankel Matrices over a Finite Field Matthew T Comer Dept of Mathematics, North Carolina State University Raleigh, North Carolina, 27695-8205 USA

More information

Mathematical Foundations of Cryptography

Mathematical Foundations of Cryptography Mathematical Foundations of Cryptography Cryptography is based on mathematics In this chapter we study finite fields, the basis of the Advanced Encryption Standard (AES) and elliptical curve cryptography

More information

Lecture 2: January 18

Lecture 2: January 18 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 2: January 18 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular

More information

WORKING WITH MULTIVARIATE POLYNOMIALS IN MAPLE

WORKING WITH MULTIVARIATE POLYNOMIALS IN MAPLE WORKING WITH MULTIVARIATE POLYNOMIALS IN MAPLE JEFFREY B. FARR AND ROMAN PEARCE Abstract. We comment on the implementation of various algorithms in multivariate polynomial theory. Specifically, we describe

More information

Experience in Factoring Large Integers Using Quadratic Sieve

Experience in Factoring Large Integers Using Quadratic Sieve Experience in Factoring Large Integers Using Quadratic Sieve D. J. Guan Department of Computer Science, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424 guan@cse.nsysu.edu.tw April 19, 2005 Abstract

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 412 (2011) 1484 1491 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: wwwelseviercom/locate/tcs Parallel QR processing of Generalized

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

Number Theory Notes Spring 2011

Number Theory Notes Spring 2011 PRELIMINARIES The counting numbers or natural numbers are 1, 2, 3, 4, 5, 6.... The whole numbers are the counting numbers with zero 0, 1, 2, 3, 4, 5, 6.... The integers are the counting numbers and zero

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

A Fast Euclidean Algorithm for Gaussian Integers

A Fast Euclidean Algorithm for Gaussian Integers J. Symbolic Computation (2002) 33, 385 392 doi:10.1006/jsco.2001.0518 Available online at http://www.idealibrary.com on A Fast Euclidean Algorithm for Gaussian Integers GEORGE E. COLLINS Department of

More information

ECEN 5022 Cryptography

ECEN 5022 Cryptography Elementary Algebra and Number Theory University of Colorado Spring 2008 Divisibility, Primes Definition. N denotes the set {1, 2, 3,...} of natural numbers and Z denotes the set of integers {..., 2, 1,

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

Three Ways to Test Irreducibility

Three Ways to Test Irreducibility Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 12 Feb 2009 Outline Polynomials over finite fields Irreducibility criteria

More information

An exploration of matrix equilibration

An exploration of matrix equilibration An exploration of matrix equilibration Paul Liu Abstract We review three algorithms that scale the innity-norm of each row and column in a matrix to. The rst algorithm applies to unsymmetric matrices,

More information

[06.1] Given a 3-by-3 matrix M with integer entries, find A, B integer 3-by-3 matrices with determinant ±1 such that AMB is diagonal.

[06.1] Given a 3-by-3 matrix M with integer entries, find A, B integer 3-by-3 matrices with determinant ±1 such that AMB is diagonal. (January 14, 2009) [06.1] Given a 3-by-3 matrix M with integer entries, find A, B integer 3-by-3 matrices with determinant ±1 such that AMB is diagonal. Let s give an algorithmic, rather than existential,

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Fundamental theorem of modules over a PID and applications

Fundamental theorem of modules over a PID and applications Fundamental theorem of modules over a PID and applications Travis Schedler, WOMP 2007 September 11, 2007 01 The fundamental theorem of modules over PIDs A PID (Principal Ideal Domain) is an integral domain

More information

Linear Systems of n equations for n unknowns

Linear Systems of n equations for n unknowns Linear Systems of n equations for n unknowns In many application problems we want to find n unknowns, and we have n linear equations Example: Find x,x,x such that the following three equations hold: x

More information

Implementations of 3 Types of the Schreier-Sims Algorithm

Implementations of 3 Types of the Schreier-Sims Algorithm Implementations of 3 Types of the Schreier-Sims Algorithm Martin Jaggi m.jaggi@gmx.net MAS334 - Mathematics Computing Project Under supervison of Dr L.H.Soicher Queen Mary University of London March 2005

More information

Chapter 14: Divisibility and factorization

Chapter 14: Divisibility and factorization Chapter 14: Divisibility and factorization Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4120, Summer I 2014 M. Macauley (Clemson) Chapter

More information

Matrix & Linear Algebra

Matrix & Linear Algebra Matrix & Linear Algebra Jamie Monogan University of Georgia For more information: http://monogan.myweb.uga.edu/teaching/mm/ Jamie Monogan (UGA) Matrix & Linear Algebra 1 / 84 Vectors Vectors Vector: A

More information

In-place Arithmetic for Univariate Polynomials over an Algebraic Number Field

In-place Arithmetic for Univariate Polynomials over an Algebraic Number Field In-place Arithmetic for Univariate Polynomials over an Algebraic Number Field Seyed Mohammad Mahdi Javadi 1, Michael Monagan 2 1 School of Computing Science, Simon Fraser University, Burnaby, B.C., V5A

More information

Journal of Symbolic Computation. On the Berlekamp/Massey algorithm and counting singular Hankel matrices over a finite field

Journal of Symbolic Computation. On the Berlekamp/Massey algorithm and counting singular Hankel matrices over a finite field Journal of Symbolic Computation 47 (2012) 480 491 Contents lists available at SciVerse ScienceDirect Journal of Symbolic Computation journal homepage: wwwelseviercom/locate/jsc On the Berlekamp/Massey

More information

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4 Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math Week # 1 Saturday, February 1, 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x

More information

Symbolic Linear Algebra

Symbolic Linear Algebra Symbolic Linear Algebra Special Lecture J Middeke Research Institute for Symbolic Computation Summer Term 2017 Contents I Monoids, Groups, Rings, Fields, and Modules 2 1 Monoids and Groups 2 2 Rings and

More information

1300 Linear Algebra and Vector Geometry

1300 Linear Algebra and Vector Geometry 1300 Linear Algebra and Vector Geometry R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca May-June 2017 Introduction: linear equations Read 1.1 (in the text that is!) Go to course, class webpages.

More information

Oleg Eterevsky St. Petersburg State University, Bibliotechnaya Sq. 2, St. Petersburg, , Russia

Oleg Eterevsky St. Petersburg State University, Bibliotechnaya Sq. 2, St. Petersburg, , Russia ON THE NUMBER OF PRIME DIVISORS OF HIGHER-ORDER CARMICHAEL NUMBERS Oleg Eterevsky St. Petersburg State University, Bibliotechnaya Sq. 2, St. Petersburg, 198904, Russia Maxim Vsemirnov Sidney Sussex College,

More information

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract Finding Succinct Ordered Minimal Perfect Hash Functions Steven S. Seiden 3 Daniel S. Hirschberg 3 September 22, 1994 Abstract An ordered minimal perfect hash table is one in which no collisions occur among

More information

RINGS: SUMMARY OF MATERIAL

RINGS: SUMMARY OF MATERIAL RINGS: SUMMARY OF MATERIAL BRIAN OSSERMAN This is a summary of terms used and main results proved in the subject of rings, from Chapters 11-13 of Artin. Definitions not included here may be considered

More information

Rational points on diagonal quartic surfaces

Rational points on diagonal quartic surfaces Rational points on diagonal quartic surfaces Andreas-Stephan Elsenhans Abstract We searched up to height 10 7 for rational points on diagonal quartic surfaces. The computations fill several gaps in earlier

More information

Math 547, Exam 1 Information.

Math 547, Exam 1 Information. Math 547, Exam 1 Information. 2/10/10, LC 303B, 10:10-11:00. Exam 1 will be based on: Sections 5.1, 5.2, 5.3, 9.1; The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/547sp10/547.html)

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 8 February 1, 2012 CPSC 467b, Lecture 8 1/42 Number Theory Needed for RSA Z n : The integers mod n Modular arithmetic GCD Relatively

More information

Three Ways to Test Irreducibility

Three Ways to Test Irreducibility Outline Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 8 Dec 2008 Polynomials over finite fields Irreducibility criteria

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

COMMUTATIVE SEMIFIELDS OF ORDER 243 AND 3125

COMMUTATIVE SEMIFIELDS OF ORDER 243 AND 3125 COMMUTATIVE SEMIFIELDS OF ORDER 243 AND 3125 ROBERT S. COULTER AND PAMELA KOSICK Abstract. This note summarises a recent search for commutative semifields of order 243 and 3125. For each of these two orders,

More information

Efficient Algorithms for Order Bases Computation

Efficient Algorithms for Order Bases Computation Efficient Algorithms for Order Bases Computation Wei Zhou and George Labahn Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada Abstract In this paper we present two algorithms

More information

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane. Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 8 Lecture 8 8.1 Matrices July 22, 2018 We shall study

More information

(Inv) Computing Invariant Factors Math 683L (Summer 2003)

(Inv) Computing Invariant Factors Math 683L (Summer 2003) (Inv) Computing Invariant Factors Math 683L (Summer 23) We have two big results (stated in (Can2) and (Can3)) concerning the behaviour of a single linear transformation T of a vector space V In particular,

More information

Factorization of singular integer matrices

Factorization of singular integer matrices Factorization of singular integer matrices Patrick Lenders School of Mathematics, Statistics and Computer Science, University of New England, Armidale, NSW 2351, Australia Jingling Xue School of Computer

More information

Algorithms for Normal Forms for Matrices of Polynomials and Ore Polynomials

Algorithms for Normal Forms for Matrices of Polynomials and Ore Polynomials Algorithms for Normal Forms for Matrices of Polynomials and Ore Polynomials by Howard Cheng A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor

More information

LECTURE NOTES IN CRYPTOGRAPHY

LECTURE NOTES IN CRYPTOGRAPHY 1 LECTURE NOTES IN CRYPTOGRAPHY Thomas Johansson 2005/2006 c Thomas Johansson 2006 2 Chapter 1 Abstract algebra and Number theory Before we start the treatment of cryptography we need to review some basic

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

A fast randomized algorithm for orthogonal projection

A fast randomized algorithm for orthogonal projection A fast randomized algorithm for orthogonal projection Vladimir Rokhlin and Mark Tygert arxiv:0912.1135v2 [cs.na] 10 Dec 2009 December 10, 2009 Abstract We describe an algorithm that, given any full-rank

More information

Remainders. We learned how to multiply and divide in elementary

Remainders. We learned how to multiply and divide in elementary Remainders We learned how to multiply and divide in elementary school. As adults we perform division mostly by pressing the key on a calculator. This key supplies the quotient. In numerical analysis and

More information

Polynomial multiplication and division using heap.

Polynomial multiplication and division using heap. Polynomial multiplication and division using heap. Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University. Abstract We report on new code for sparse multivariate polynomial

More information

Example: This theorem is the easiest way to test an ideal (or an element) is prime. Z[x] (x)

Example: This theorem is the easiest way to test an ideal (or an element) is prime. Z[x] (x) Math 4010/5530 Factorization Theory January 2016 Let R be an integral domain. Recall that s, t R are called associates if they differ by a unit (i.e. there is some c R such that s = ct). Let R be a commutative

More information

Finite Math - J-term Section Systems of Linear Equations in Two Variables Example 1. Solve the system

Finite Math - J-term Section Systems of Linear Equations in Two Variables Example 1. Solve the system Finite Math - J-term 07 Lecture Notes - //07 Homework Section 4. - 9, 0, 5, 6, 9, 0,, 4, 6, 0, 50, 5, 54, 55, 56, 6, 65 Section 4. - Systems of Linear Equations in Two Variables Example. Solve the system

More information

Direct solution methods for sparse matrices. p. 1/49

Direct solution methods for sparse matrices. p. 1/49 Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve

More information

(1) for all (2) for all and all

(1) for all (2) for all and all 8. Linear mappings and matrices A mapping f from IR n to IR m is called linear if it fulfills the following two properties: (1) for all (2) for all and all Mappings of this sort appear frequently in the

More information

Coding Theory ( Mathematical Background I)

Coding Theory ( Mathematical Background I) N.L.Manev, Lectures on Coding Theory (Maths I) p. 1/18 Coding Theory ( Mathematical Background I) Lector: Nikolai L. Manev Institute of Mathematics and Informatics, Sofia, Bulgaria N.L.Manev, Lectures

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

Algebra. Modular arithmetic can be handled mathematically by introducing a congruence relation on the integers described in the above example.

Algebra. Modular arithmetic can be handled mathematically by introducing a congruence relation on the integers described in the above example. Coding Theory Massoud Malek Algebra Congruence Relation The definition of a congruence depends on the type of algebraic structure under consideration Particular definitions of congruence can be made for

More information