Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials

Size: px
Start display at page:

Download "Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials"

Transcription

1 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials Tong Zhang, Student Member, IEEE, and Keshab K Parhi, Fellow, IEEE AbstractÐThis paper considers the design of bit-parallel dedicated finite field multipliers using standard basis An explicit algorithm is proposed for efficient construction of Mastrovito product matrix, based on which we present a systematic design of Mastrovito multiplier applicable to GF m generated by an arbitrary irreducible polynomial This design effectively exploits the spatial correlation of elements in Mastrovito product matrix to reduce the complexity Using a similar methodology, we propose a systematic design of modified Mastrovito multiplier, which is suitable for GF m generated by high-hamming weight irreducible polynomials For both original and modified Mastrovito multipliers, the developed multiplier architectures are highly modular, which is desirable for VLSI hardware implementation Applying the proposed algorithm and design approach, we study the Mastrovito multipliers for several special irreducible polynomials, such as trinomial and equally-spaced-polynomial, and the obtained complexity results match the best known results Moreover, we have discovered several new special irreducible polynomials which also lead to low-complexity Mastrovito multipliers Index TermsÐFinite (or Galois) field, standard basis, multiplication, irreducible polynomials, complexity, VLSI architecture, Toeplitz matrix æ 1 INTRODUCTION EFFICIENT hardware implementations of finite (or Galois) field GF m arithmetic units are highly desirable for many applications in error-correcting coding and cryptography [1], [], [], [4] Among the GF m arithmetic operations, multiplication is the most basic and important building block in many applications A number of efficient GF m multiplication approaches and architectures have been proposed in which different basis representations of field elements are used, such as standard basis, dual basis, and normal basis Standard basis is more promising in the sense that it gives designers more freedom on irreducible polynomial selection and hardware optimization For detailed discussions of these three basis representations, readers are referred to [], [5] In this paper, we are interested in the design of bit-parallel GF m multipliers using standard basis The standard basis multiplication involves two steps: polynomial multiplication and modulo reduction Let f x be the irreducible polynomial generating GF m, c x be the product of a x and b x, where a x, b x, c x GF m The finite field multiplication is performed as c x ˆ a x b x mod f x An efficient dedicated bitparallel multiplier was proposed by Mastrovito [] in which a product matrix M is introduced to combine the above two steps together Thus, the multiplication is carried out by c ˆ M b, where c and b represent the coefficient The authors are with the Department of Electrical and Computer Engineering, University of Minnesota, 00 Union Street SE, Minneapolis, MN {tzhang, parhi}@eceumnedu Manuscript received Oct 000; revised Feb 001; accepted Apr 001 For information on obtaining reprints of this article, please send to: tc@computerorg, and reference IEEECS Log Number 110 vectors of c x and b x, respectively Given irreducible polynomial f x, each entry in matrix M is obtained by oring certain coefficients of a x Many entries in matrix M can be computed efficiently by sharing some common items, eg, two entries M i 1 ;j 1 ˆa 0 a a and M i ;j ˆa 0 a a 4 can be computed using three OR gates by sharing the common item a 0 a This method is called subexpression sharing [] Mastrovito multipliers using two special irreducible polynomials, trinomial and equallyspaced-polynomial (ESP), have been studied by many researchers for their low-complexity implementations [8], [9], [10], [11], [1], [1], [14] The essence of all these works is to find an architecture to exploit subexpression sharing efficiently based on the specific irreducible polynomials It has been shown in [8] that Mastrovito multiplier using irreducible trinomial x m x n 1 only requires m 1 OR gates and m AND gates By generalizing the approach of [8], Halbutogullari and KocË [14] discovered that the space complexity of Mastrovito multiplier using irreducible ESP x m x tr x r 1, where t 1 r ˆ m, can be reduced to m r OR gates and m AND gates Furthermore, [14] presents a new formulation of the Mastrovito product matrix for an arbitrary irreducible polynomial x m x nk x n1 1, where the space complexity is given as m AND gates and m 1 m k 1 P jn m 1 j OR gates, Nf0; 1; ;m g However, [14] fails to find a method to explicitly compute the set N, which makes its result less practicable in general In general cases, the complexity of computing product matrix is proportional to the Hamming weight of the irreducible polynomial f x (denoted as pwt) So, the Mastrovito multiplier is good only when low-hamming weight irreducible polynomials are used A modified /01/$1000 ß 001 IEEE

2 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 5 Mastrovito multiplier was proposed by Song and Parhi [15], which has a complexity proportional to m 1 pwt Its basic idea is to use the complementary irreducible polynomial for the computation of matrix M with appropriate compensation Such a multiplication scheme is efficient when GF m is generated by high-hamming weight irreducible polynomial Generally, for large m, efficient design for both original and modified Mastrovito multipliers becomes rather difficult and a systematic design approach is highly desirable In this paper, we generalize the approach of [8] in a different way compared with [14] We propose a theorem and an algorithm (which can explicitly compute set N in [14]) for the construction of product matrix in the original Mastrovito multiplier, based on which we develop an efficient systematic design of the original Mastrovito multiplier This design effectively exploits subexpression sharing in the computation of product matrix to reduce the complexity Using a similar methodology, we also develop a systematic design of modified Mastrovito multiplier For both original and modified Mastrovito multipliers, explicit algorithms and architectures are presented and the complexities are given in detail Applying our proposed design approach, we study the irreducible trinomial and ESP and the complexity results match the results in [8] and [14] Meanwhile, another computation approach for trinomial case is proposed to make a trade-off between space complexity and delay Moreover, with the aid of the proposed algorithm, we discover several new irreducible polynomials leading to low-complexity original Mastrovito multipliers, which is especially desirable when neither an irreducible trinomial nor an irreducible ESP exists This paper is organized as follows: We introduce the notation of this paper and the fundamentals of finite field and Mastrovito multiplier in Section In Section, we propose a theorem and an algorithm for the construction of product matrix, based on which a systematic design approach for original Mastrovito multiplier is developed In Section 4, using a similar design methodology, we present a systematic design approach for modified Mastrovito multiplier Efficient computation approaches for several special irreducible polynomials are discussed in Section 5 This paper is an extended version of [1] NOTATION AND PRELIMINARIES Since several arithmetic operations pertaining to matrices and vectors will be extensively used throughout this paper, we first introduce the following notation: Column vectors and matrices are represented by small and capital boldfaced characters, respectively Matlab matrix notations are used, eg, Z i; :, Z :;j, and Z i; j represent the ith row vector, jth column vector, and the entry with position i; j in matrix Z, respectively The operations of shift by feeding zero are represented by corresponding arrows, eg, v # Š, U! 1Š, and U # 1Š represent down shift of vector v by two positions, right shift of matrix U by one column, and down shift of matrix U by one row, respectively, which are explicitly given as: v # Š ˆ 0; 0;v 0 ; ;v n Š T U! 1Š ˆ o; U :; 1 ; ; U :;m 1 Š U # 1Š ˆ o; U 1; : T ; ; U m 1; : T Š T ; where o represents the zero column vector Furthermore, we note that the AND and OR gates considered in this paper are all -input gates, whose delays are denoted as T A and T, respectively Finite field GF m contains m elements and can be viewed as an m-dimensional vector space over GF, which only has two elements, 0 and 1 With the standard basis f1;x;x ; ;x m 1 g, the elements of the finite field GF m can be represented as polynomials of degree m 1 as follows: GF m ˆfa x ja x ˆ a m 1 x m 1 a m x m a 1 x a 0 ;a i GF g: Such polynomial representation is generally used for finite field arithmetic operations, where addition is carried out by polynomial addition over GF m using bit-independent OR operations Finite field multiplication using standard basis is carried out by polynomial multiplication and modulo operation Let f x ˆx m f m 1 x m 1 f 1 x 1 be the irreducible polynomial generating GF m, and c x be the product of a x and b x, where a x ;b x ;c x GF m The polynomial multiplication, d x ˆa x b x, can performed as d ˆ A b; where b ˆ b 0 ;b 1 ; ;b m 1 Š T and d ˆ d 0 ;d 1 ; ;d m Š T are the coefficient vectors of b x and d x, respectively, and matrix A is given as a a 1 a a m a m a 0 0 A ˆ a m 1 a m a 1 a 0 : 1 0 a m 1 a a a m 1 a m a m 1 Next, we perform the modulo reduction c x ˆd x mod f x ; which can be expressed as m c x ˆ d k x k mod f x kˆ0 ˆ m 1 kˆ0 m d k x k d k x k mod f x : kˆm For m k m, if we denote x k mod f x as u l x, where l ˆ k m 1, we have u l x ˆ xm mod f x ; l ˆ 1; u l 1 x xmod f x ; l ˆ ; ; ;m 1:

3 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Since the irreducible polynomial is f x ˆx m f m 1 x m 1 f 1 x f 0 ; where f 0 ˆ 1, we get u 1 j ˆ f j ; 0 j m 1; ( j ˆ u l 1 j 1 u l 1 m 1 f j; 1 j m 1 u l 1 m 1 ; j ˆ 0; u l where u l 1 j and u l j are the coefficients of u l 1 x and u l x, respectively Let u l denote the coefficient vector of u l x, then, using vector notation, () can be rewritten as u l ˆ f; l ˆ 1 u l 1 # 1Š u l 1 4 m 1 f; l m 1; where f ˆ 1;f 1 ; ;f m 1 Š T Let s denote vector d 0 ;d 1 ; ;d m 1 Š T ; which consists of the first m entries of d, we can rewrite () in matrix notation as follows: c ˆ I mm s m 1 d l m 1 u l lˆ1 ˆ I mm ; U d ˆ I mm ; U A b; where I mm represent m m identity matrix and matrix U ˆ u 1 ; u ; ; u m 1 Š Note that all successive columns in matrix U have the recursive relation as in (4) Define M ˆ I mm ; U A: Thus, the GF m multiplication can be carried out by c ˆ M b, which is the well-known Mastrovito multiplication scheme, and the matrix M is called product matrix MASTROVITO MULTIPLIER In this section, we first introduce a theorem and an algorithm pertaining to the construction of product matrix M Then, we develop an approach to compute M by adding a series of Toeplitz matrices, through which subexpression sharing could be extensively exploited to reduce the OR complexity Accordingly, a highly modular multiplier architecture is presented 1 Proposed Theorem and Algorithm In Section, we have shown that product matrix M can be expressed as the product of two independent matrices, I mm ; UŠ and A, and there exists a recursive relation between successive columns in matrix U, as described in (4) Through the following theorem and algorithm, we will see that matrix U also can be constructed by adding a series of Toeplitz matrices Theorem 1 For the computation of matrix U in (5), we can always construct a set Nf0; 1; ;m g and U ˆ F! nš; nn 5 where the Toeplitz matrix F ˆ f; f # 1Š; ; f # m ŠŠ Let the irreducible polynomial be f x ˆx m x k s x k 1 1; where m>k s > >k 1 > 1, the set N in Theorem 1 can be explicitly constructed with the aid of a weighted tree generated based on f x The complete construction approach is described by the following algorithm: Algorithm 1 Input: The parameters of irreducible polynomial: m, k 1 ; ;k s ; Output: set Nf0; 1; ;m g Procedure: 1 Generate a weighted tree D according to the following properties: Each node d j in D has at most s child nodes and each edge has the weight w f m k i ; 1 i sg; Let d 1 denote the root and h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m k i ; 1 i sg and h d 1 ;d j r <m 1, then d j always has one child node d l and the weight of edge between d j and d l is r; 8d j D, h d 1 ;d j <m 1 Construct multiset Hˆfh d 1 ;d j ; 8d j Dg and set Nˆ;; For 0 j m, do a create multiset S j ˆ;; b 8 h H,ifhˆj, then insert h into S j ; c if js j j mod ˆ1, then insert j into the set N Here, we note that a multiset is like a set, except that repeated elements are allowed, and js j j represents the order of S j A proof of Theorem 1 is given in Appendix A, in which Algorithm 1 is developed From the above algorithm, we know that the least two elements in N are always 0 and (m k s ) and we have jn j k s Example 1 Consider the multiplication of a x ˆx 4 x and b x ˆx x 1 over GF 5 with the underlying irreducible polynomial f x ˆx 5 x 4 x x 1 We have f m k i ; 1 i sg ˆf1; ; g Applying Algorithm 1, we generate the tree D as shown in Fig 1, and get the multiset Hˆf0; 1; ; ; ; ; ; g Thus, we have S 0 ˆf0g )js 0 jˆ1; S ˆf; g )js jˆ; S 1 ˆf1g )js 1 jˆ1; S ˆf; ; ; g )js jˆ4: Since js j mod ˆjS j mod ˆ 0, we get Nˆf0; 1g Therefore, we perform the multiplication as

4 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE Define E j ˆ h i e j ; e j # 1Š; ; e j # m Š : We can write the Toeplitz matrix F in Theorem 1 as P s iˆ0 E k i 1, where k 0 ˆ 0, and have Fig 1 Tree structure h c ˆ M b ˆ I 55 ; i F! nš nn A b ˆ ˆ ˆ 0 ; where F ˆ h i f; f # 1Š; ; f # m Š ˆ : Thus, we have c x ˆ a x b x mod f x ˆx 4 Multiplier Architecture Applying Theorem 1 and Algorithm 1, in this section, we develop an efficient computation approach for the product matrix M and present the corresponding lowcomplexity Mastrovito multiplier architecture In the following, we frequently use two special matrices: the Toeplitz matrix and the upper-triangular Toeplitz matrix It is well-known that the sum of two upper-triangular Toeplitz matrices (or one Toeplitz matrix and one upper-triangular Toeplitz matrix) can be obtained by only computing the sum of two row vectors This property will be used to exploit the subexpression sharing in the construction of product matrix M to reduce the entire OR complexity Given a general irreducible polynomial f x ˆx m x ks x k1 1, we express its coefficient vector as f ˆ e 1 e k1 1 e ks 1, where e i is the m-dimensional ith canonical unit vector: e i ˆ 0; ; 0; 1; 0; ; 0 Š T : {z } {z } i 1 m i U ˆ F! nš ˆs nn E ki 1! nš: iˆ0 nn Moreover, according to (1), we write A in block matrix form as A ˆ A T s ; AT t ŠT ; where A s is an m m lower-triangular Toeplitz matrix and A t is an m 1 m upper-triangular Toeplitz matrix: a a 1 a 0 0 A s ˆ 4 5 ; a m 1 a m a 0 0 a m 1 a a a a A t ˆ 4 5 : a m 1 Substituting (), (8), and (9) into (5), we get M ˆ A s s iˆ0 nn 8 9 E ki 1! nša t : 11 Based on the special structure of E j as defined in (), it can be easily proven that E j! nša t ˆ ~At! nš # j 1 Š; 1 where ~ A t ˆ A T t ; ošt and o is an m-dimensional zero column vector Thus, if we denote P nn ~ A t! nš as S, (11) can be rewritten as M ˆ A s s iˆ0 ˆ A s S s nn ~A t! nš # k i Š S # k i Š : 1 Because each A ~ t! nš is an upper-triangular Toeplitz matrix, we easily know that matrix S is also an uppertriangular Toeplitz matrix with the following form: 0 s m 1 s 1 S ˆ s m and computing S 1; : ˆ ~At 1; :! nš ˆ nn nn A t 1; :! nš 15

5 8 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Fig General Mastrovito Multiplier architecture is sufficient to completely construct matrix S Since the first entry of A t 1; : is zero, the first n 1 entries of A t 1; :! nš are also zeroes Recalling that n ˆ 0 is the least element in N, we get the OR complexity for computing S 1; : based on (15) is m n 1 ˆ m n 1 m 1 1 nn & nˆ0 nn and if binary tree structure is used, the delay will be dlog jn jet After having obtained S, we use (1) to compute the product matrix M From (10) and (14), we know that the addition of A s and S does not need any OR gates and the matrix A s S, denoted by T, is a Toeplitz matrix Therefore, (1) can be rewritten as M ˆ T S # k 1 Š S # k s Š: 1 In order to compute (1) efficiently, we introduce the following observation: Observation 1 Given an m m upper-triangular Toeplitz matrix S and m m matrix P whose last m l rows form a Toeplitz submatrix If m>jl, then computing the sum of vector P j 1; : and S 1; : is sufficient to construct the sum of P and S # jš and the last m j rows of the sum matrix still form a Toeplitz submatrix Applying Observation 1, we compute M based on (1) with linear tree structure as follows: Algorithm 1 Initially, set Q 0 ˆ T; For 1 i s, construct Q i ˆ Q i 1 S # k i Š by computing Q i k i 1; : ˆQ i 1 k i 1; : S 1;: ; Finally, set M ˆ Q s In the above algorithm, for each i, Q i 1 k i 1; : S 1;: requires m 1 OR gates, so we need s m 1 OR gates to compute M with the delay of st In order to obtain the result c x via c ˆ M b, we also need m AND gates and m m 1 OR gates to complete this matrix-vector multiplication Its delay will be T A dlog met if binary tree structure is used Based on the above computation approach, we develop the dedicated Mastrovito multiplier architecture as shown in Fig Given irreducible polynomial f x ˆx m x ks x k1 1, set N is constructed using Algorithm 1 Product Matrix Module computes matrix M and consists of two blocks: M1 and M Block M1 generates the vector S 1; : by computing P nn A t 1; :! nš Supplied with S 1; :, block M computes the product matrix M using Algorithm M contains (m 1) P j blocks, each one generates one row vector of M Ifj fk i ; 1 i sg, then P j is identical to block B; otherwise, it is identical to block B1 The Matrix-Vector Multiplication Module computes M b and consists of m identical V blocks The block V computes the inner-product of two vectors of length m The total complexity of the proposed original Mastrovito multiplier for general irreducible polynomial is given by OR Complexity: m s 1 m 1 nn m n 1 ; AND Complexity: m, Delay: T A s dlog jn je dlog me T It needs to be pointed out that, for a given irreducible polynomial, there likely exist some common items in the computation of (15) and (1) By sharing these common items, we may further optimize the hardware architecture of product matrix module to some extent Thus, the multiplier architecture shown in Fig may need some corresponding modifications and the above OR complexity value is actually an upper bound for general cases 4 MODIFIED MASTROVITO MULTIPLIER Using a similar design methodology as in Section, in this section, we develop a systematic design approach for the modified Mastrovito multiplier [15], which is preferable if high-hamming weight irreducible polynomial is used 41 Proposed Theorem and Algorithm For a polynomial f x ˆf m x m f m 1 x m 1 f 1 x f 0, we define its complementary polynomial as p x ˆ 1 f m x m 1 f m 1 x m 1 1 f 1 x 1 f 0 :

6 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 9 According to [15], we have the following theorem to construct matrix U in (5) using the complementary irreducible polynomial: Theorem 41 Given an irreducible polynomial f x ˆx m x ks x k1 1; its complementary polynomial can be written as p x ˆx t m s 1 x t 1 ; where m>t m s 1 > >t 1 > 0 Let p represent the coefficient vector of p x, then we can obtain the matrix U in (5) by adding two matrices V and Q, where V is constructed recursively as V :;i ˆ 8 p; i ˆ 1 >< p # 1Š V m; 1 1 p e 1 ; i ˆ V :;i 1 # 1Š V m; i e 1 >: V m; i V m; i 1 p; i m 1; where e 1 is the first canonical unit vector and Q ˆ w e T 1 V m; :! 1Š ; where w ˆ 1; ; 1 Š T : {z } m In the following, we propose a theorem and an algorithm to construct matrix V in Theorem 41 by adding a series of Toeplitz matrices together Theorem 4 For the computation of matrix V in Theorem 41, we can always find two sets Lf0; 1; ;m g and Jf1; ;m g, and V ˆ P! lš E 1! jš; ll jj where E 1 is defined in () and h i P ˆ p; p # 1Š; ; p # m Š : The sets L and J in Theorem 4 are constructed by the following algorithm: Algorithm 41 Input: The parameters of complementary polynomial: m, t 1 ; ;t m s 1 ; Output: set Lf0; 1; ;m g and Jf1; ;m g Procedure: 1 Generate a weighted tree D according to the following properties: The root d 1 always has one child node d connected by an edge with weight w ˆ 1 Besides d, root d 1 has at most m s 1 other child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g; 8 d j ˆ d 1, it has at most m s 1 child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g; Let h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m t i ; m t i 1 ; 1 i m s 1 g and h d 1 ;d j r <m 1, then d j always has a child node d l and the weight of edge between d j and d l is r; 8d j D, h d 1 ;d j <m 1 Construct a subset, denoted as ~D, of all nodes in D such that it contains each node which is connected with its parent node by an edge with the weight z f m t i 1 ; 1 i m s 1 g; Construct multisets Hˆfh d 1 ;d j ; 8d j Dg and Gˆf1;h d 1 ;d i ; 8 d i ~Dg and sets LˆJ ˆ;; 4 For 0 j m, do a create S j ˆT j ˆ;; b 8 h H,ifh ˆ j, then insert h into S j ; 8 g G,if g ˆ j, then insert g into T j ; c if js j j mod ˆ1, then insert j into the set L; if jt j j mod ˆ1, then insert j into the set J A proof of Theorem 41 is given in Appendix B in which Algorithm 41 is developed Example 41 Consider the construction of matrix U when the high Hamming weight irreducible polynomial f x ˆx x x 5 x x x 1 is being used Its complementary polynomial is p x ˆx 4 We have f m t i ; m t i 1 ; 1 i m s 1 g ˆ f; 4g Applying Algorithm 41, we generate the tree D as shown in Fig From this tree, we get Hˆf0; 1; ; 4; 4; 5g and Gˆf1; 4; 5g Thus, we obtain Lˆf0; 1; ; 5g and Jˆf1; 4; 5g So, matrix V is computed as V ˆ P! lš E 1! jš ll jj ˆ ˆ : Since V ; : ˆ Š, we have V ; :! 1Š ˆ Š and

7 40 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 V A t ˆ P! lš E 1! jš A t ll jj m s 1 ˆ ~A t! lš # t i Š ~A t! jš; jj ll Fig Tree structure Q ˆ w e T 1 V ; :! 1Š ˆ w Š: Finally, we have U ˆ V Q ˆ ˆ : Multiplier Architecture In this section, based on the above theorem and algorithm, we develop an efficient computation approach and corresponding multiplier architecture for the modified Mastrovito multiplication scheme According to (5), (9), and Theorem 41, we have M ˆ I mm ; UŠA ˆ A s U A t ˆ A s V A {z } t Q A t ; 18 {z } M 1 M and the modified Mastrovito multiplication is carried out as c ˆ M b ˆ M 1 b M b: In the following, we develop efficient approaches for computing M 1 and M, respectively Let's begin with M 1 Recall that the complementary polynomial is p x ˆx tm s 1 x t1, thus the matrix P in Theorem 4 can be written as m s 1 P ˆ E ti 1; 19 where E i is defined in () Applying (1) and (19), we have where A ~ t ˆ A T t ; ošt and o is an m-dimensional zero column vector Denote P ~ jj A t! jš and P ~ ll A t! lš as ~S 1 and S ~, respectively, we have m s 1 M 1 ˆ A s ~ S 1 ~S # t i Š: 0 Since each A ~ t! nš is an upper-triangular Toeplitz matrix, we know that both S ~ 1 and S ~ are also upper-triangular Toeplitz matrices, and computing ~S 1 :; 1 ˆ A t 1; :! jš jj ~S :; 1 ˆ ll A t 1; :! lš 1 is sufficient to construct S ~ 1 and S ~ Since the least element in L is always 0, similar to (1), we obtain the OR complexities of computing S ~ 1 and S ~ as m j 1 m min J 1 and jj m l 1 m 1 ll with the delay of dlog jj jet and dlog jljet, respectively Similarly to Algorithm, for the computation of M 1 using (0), we have the following algorithm: Algorithm 4 1 Initially, set Q 0 ˆ A s ~ S 1 ; For 1 i m s 1, construct Q i ˆ Q i 1 ~S # t i Š by computing Q i t i 1; : ˆQ i 1 t i 1; : ~ S 1; : ; Finally, set M 1 ˆ Q m s 1 In the above algorithm, the construction of Toeplitz matrix Q 0 doesn't need any gates For 1 i m s 1, each step needs m 1 OR gates, so the OR complexity and delay of the above algorithm is m s 1 m 1 and m s 1 T, respectively Based on the above discussion, we conclude that the matrix M 1 can be computed with the following procedure: Procedure 41 1 Given a high-hamming weight irreducible polynomial f x ˆx m x ks x k1 1, get its complementary polynomial p x ˆx t m s 1 x t 1 and construct two sets L and J using Algorithm 41; Construct S ~ 1 and S ~ using (1), if we combine L and J together to form a new multiset L, then the complexity of this step is

8 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 41 Fig 4 General Modified Mastrovito Multiplier architecture OR complexity: ll m l 1 m 1 min J ; Delay: dlog DeT, where D ˆ max jlj; jj j ; Then, use Algorithm 4 to compute matrix M 1 For this step, we have OR complexity: m s 1 m 1 ; Delay: m s 1 T Next, let's consider the computation of M Denote the vector e T 1 V m; :! 1Š as qt From Theorem 41, we know that all row vectors in Q are equal to q T Thus, each row vector in M is equal to q T A t Without loss of generality, suppose q T has d nonzero entries whose positions are r d ; ;r 1, respectively, where r d > >r 1, then we have q T A t ˆ d A t r i ; : : Since A t is an upper-triangular Toeplitz matrix, we have A t r i ; : ˆA t 1; :! r i 1 Š, thus () can be rewritten as q T A t ˆ d A t 1; :! r i 1 Š: Therefore, the computation of M only needs P d iˆ m r i OR gates with the delay of dlog det In the above, we have developed the approaches for computing M 1 and M In order to complete the GF m multiplication, we only need to compute c ˆ M b ˆ M 1 b M b: The matrix-vector multiplication M 1 b requires m m 1 OR and m AND gates with the delay of T A dlog met Since all row vectors in M are identical and the first r 1 elements in each row are zeros, the matrix-vector multiplication M b only requires m r 1 1 OR and m r 1 AND gates with the delay of T A dlog m r 1 et The addition of M 1 b and M b needs m OR gates with the delay of T Based on the above computation approach, we develop the modified Mastrovito multiplier architecture as shown in Fig 4 The set L and J are constructed using Algorithm 41 Product Matrix Module computes the two matrices M 1 and M and consists of four blocks: PM1, PM, PM, and PM4 Blocks PM1 and PM generate the vector ~ S 1; : and ~S 1 1; :, respectively Block PM computes the matrix M 1 using Algorithm 4 PM contains (m 1) P j blocks, each one generates one row vector of M If j ft i ; 1 i m s 1g, then P j is identical to block B; otherwise, it is identical to block B1 Block PM4 computes the vector q T A t, the row vector of matrix M, where r 1 ; ;r d represent the position of the nonzero elements in q T The Matrix-Vector Multiplication & Addition Module computes M 1 b M b In this architecture, the operation of M 1 b and M b can be performed in parallel If their delays are denoted by T A m 1 T and T A m T, the total delay of the modified Mastrovito multiplier will be T A 1 max m 1 ;m T Therefore, the total complexity of modified Mastrovito multiplier is OR complexity: m s m 1 ll m l 1 d m r i min J ; AND complexity: m m r 1, Delay: T A 1 max m 1 ;m T, where multiset L is the combination of L and J, d is the Hamming weight of q T, and r i represents the index of nonzero element of q T, and

9 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 m 1 ˆ m s 1 dlog max jlj; jj j e dlog me; m ˆdlog de dlog m r 1 e: We note that, for given high Hamming weight irreducible polynomial, further hardware optimization is possible by sharing common items during computation of M 1 and M So, the above OR complexity result is also an upper bound for general cases 5 SPECIAL IRREDUCIBLE POLYNOMIALS In Section, we presented an efficient computation approach of original Mastrovito multiplication for general cases and pointed out that further simplification can be achieved for specific irreducible polynomials by further exploiting subexpression sharing In this section, we show that by applying our proposed explicit algorithm for the construction of set N, we can easily obtain efficient multiplication schemes with further reduced complexity for several special irreducible polynomials 51 m k s In the following, we show that, for irreducible polynomial f x ˆx m x k s x k 1 1, where m k s, we can reduce the OR complexity of the Mastrovito multiplier by computing S 1; : in (15) using linear tree structure instead of binary tree Since m k s, we easily have Nˆf0;m k s ;m k s 1 ; ;m k 1 g and jn j ˆ s 1 Thus, (15) can be simplified as S 1; : ˆA t 1; : s A t 1; :! m k i Š : 4 Using linear tree structure, we compute S 1; :, according to (4), as follows: Algorithm 51 1 Initially, set v T 0 ˆ A t 1; : ; For 1 i s, compute v T i ˆ v T i 1 A t 1; :! m k i Š; Finally, set S 1; : ˆv T s The OR complexity of the above algorithm is m n 1 m 1 ˆs k i 1 ; nn 5 with the delay of st Compared with using binary tree, the OR complexity doesn't change, but delay increases from dlog s 1 et to st Next, we will prove that, as compensation for the increased delay, the OR complexity of computing M using Algorithm can be reduced from s m 1 to P s m k i by sharing the intermediate results obtained in Algorithm 51 Proof In Algorithm, based on Observation 1, we construct each matrix Q i by only computing Q i k i 1; : ˆQ i 1 k i 1; : S 1; : : Moreover, since the last m k i rows of each matrix Q i form a Toeplitz submatrix, we have Q i 1 k i 1; : ˆQ i 1 k i 1 1; :! k i k i 1 Š: Therefore, based on (), (), and the fact that Q 0 ˆ T ˆ S A s, by induction we have Q i k i 1; : ˆi From (10), we have Thus, jˆ1 S 1; :! k i k j Š S 1; :! k i Š A s k i 1; : : A s k i 1; : ˆ a ki ;a ki 1 ; ;a 1 {z } k i ;a 0 ; 0; ; 0Š; A t 1; : ˆ 0;a m 1 ; ;a ki 1 {z } m k i ;a ki ; ;a 1 Š: 8 A s k i 1; : ˆA t 1; : m k i Š e T k i 1 a 0; 9 where e ki 1 is the k i 1 th canonical unit vector Therefore, (8) becomes Q i k i 1; : ˆi jˆ1 S 1; :! k i k j Š S 1; :! k i Š A t 1; : m k i Š e T k i 1 a 0: 0 Since each Q i k i 1; : is an m-dimensional row vector, the first k i entries of Q i k i 1; : are identical to the last k i entries of Q i k i 1; :! m k i Š From (14), we also know that 8 q m 1, S 1; :! qš is a zero row vector Therefore, according to (0), the last k i entries of Q i k i 1; :! m k i Š are identical to the last k i entries of ~q T i, where ~q T i is defined as ~q T i ˆ i jˆ1 S 1; :! m k j Š A t 1; : : Substituting (4) into (1), we have ~q T i ˆ i jˆ1 s jˆ1 A t 1; :! m k j Š i A t 1; :! m k j m k i Š A t 1; : : 1 Since m k s, we have 8 i; j s, m k i m k j m and A t 1; :! m k j m k i Š is actually a zero row vector So, we can rewrite () as ~q T i ˆ i jˆ1 A t 1; :! m k j Š A t 1; : : We note that ~q T i is identical to v T i, which we have obtained in Algorithm 51 So, the first k i entries of Q i k i 1; : are equal to the last k i entries of the

10 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 4 intermediate result v T i in Algorithm 51 Thus, in Algorithm, for each 1 i s, we only need to compute the last (m k i ) entries of Q i k i 1; :, which requires m k i OR gates, and the total OR complexity of Algorithm is P s m k i with the delay of st tu Therefore, the entire OR complexity of computing product matrix M is s k i 1 s m k i ˆs m 1 with the delay of st Moreover, the matrix-vector multiplication M b requires m m 1 OR and m AND gates with the delay of T A dlog met Thus, for an irreducible polynomial in which m k s, if the linear tree structure is used to compute S 1; :, the entire complexity of Mastrovito multiplier is OR complexity: m s m 1, AND complexity: m, Delay: T A s dlog me T 5 Trinomial If GF m is generated by irreducible trinomial f x ˆx m x n 1, we have Nˆ l m n ; 0 l m : m n Therefore, we get k ˆb m mn c, (15) and (1) can be simplified as S 1; : ˆk iˆ0 Obviously, computing A t 1; :! i m n Š ; 4 M ˆ A s S S # nš: M n 1; : ˆA s n 1; : S n 1; : S 1; : 5 is sufficient to construct M in (5) According to the complexity results presented in Section, we know that the OR complexities of computing (4) and () are P k m i m n 1 and m 1, respectively In the following, we will see that the above complexity values can be further reduced First, we show that m n, instead of m 1, OR gates are sufficient to complete the computation in () From (9), we have A s n 1; : ˆA t 1; : m n Š e T n 1 a 0 and, since S is an upper-triangular Toeplitz matrix, its n 1 th row can be written as S n 1; : ˆS 1; :! nš k ˆ At 1; :! i m n Š! nš iˆ0 ˆ A t 1; :! nš: So, () can be rewritten as M n 1; : ˆA t 1; : m n Š e T n 1 a 0 A t 1; :! nš S 1; : : Let's consider the addition of A t 1; : m n Š and S 1; : in () Because the last m n entries of A t 1; : m n Š are zeros, we only need to compute the first n entries of A t 1; : m n Š S 1; : Obviously, the first n entries of A t 1; : m n Š S 1; : are equal to the last n entries of A t 1; : S 1;:! m n Š and we have A t 1; : S 1;:! m n Š k ˆ A t 1; : A t 1; :! i m n Š! m n Š ˆ k iˆ0 iˆ0 A t 1; :! i m n Š ˆ S 1; : A t 1; :! k 1 m n Š: A t 1; :! k 1 m n Š Since k ˆb m m nc, we have k 1 m n m 1 Thus, the item A t 1; :! k 1 m n Š is actually a zero vector and the first n entries of A t 1; : m n Š S 1; : are identical to the last n entries of S 1; : Therefore, the addition of A t 1; : m n Š and S 1; : does not need any OR gates Furthermore, in (), the sum of the other two items, e T n 1 a 0 A t 1; :! nš ˆ 0; ; 0;a {z } 0 ;a m 1 ; ;a n 1 Š; {z } n m n has m n nonzero entries Thus, we conclude that the computation of () only needs m n OR gates The OR complexity of computing (4) can be reduced by using either the linear tree or hybrid tree method, which will lead to different trade-off between OR complexity and delay 51 Linear Tree If (4) is computed using linear tree structure as P 0 iˆk A t 1; :! i m n Š, we achieve the lowest OR complexity with the delay increasing linearly with k This approach has been thoroughly studied in [8], in which it was proven that n 1 OR gates are sufficient to compute (4) with the delay of kt We have known that the computation of (5) needs m n OR gates with delay of 1T Thus, the total complexity of the Mastrovito multiplier in [8] was obtained as: OR complexity: m 1, AND complexity: m, Delay: T A k 1 dlog me T, k ˆb m m n c Especially, it's pointed out in [8] that if n ˆ m, then the OR complexity can be further reduced from (m 1) to (m m ), with the delay reduced by 1T 5 Hybrid Tree We have known that using linear tree structure to compute (4) will lead to a very low OR complexity with the delay increasing linearly with k However, when k is very large, the delay may be intolerable for applications requiring high speed In the following, we present another approach for computing (4), where the OR complexity also can be

11 44 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Fig 5 Hybrid tree computation scheme reduced by exploiting subexpression sharing and the delay increases linearly with blog k 1 c Let the Hamming weight of k 1 be h and blog k 1 c ˆ t h, then k 1 can always be written as k 1 ˆ t h t h 1 t 1, where t h >t h 1 > >t 1 If we denote A t 1; :! i m n Š as A t 1; : i, (4) can be rewritten as k iˆ0 where A t 1; : i w h 1 ˆ A t 1; : i w h 1 1 A t 1; : i iˆ0 w 1 1 iˆw iˆw h A t 1; : i ˆ g T h gt h 1 gt 1 ; w j ˆ h iˆj 8 t i ; 1 j h: 9 We compute the vector g T h using a binary tree B with the height of t h as shown in Fig 5, where each node represents a m-dimensional vector At each depth j, there are th j nodes with the following values: j 1 j 1 1 t h 1 A t 1; : i ; A t 1; : i ; ; A t 1; : i : 40 iˆ0 iˆ j iˆ t h j In (40), all other items can be directly obtained by right shifting the first item P j 1 iˆ0 A t 1; : i by l j columns, where 1 l th j 1 So, only computing the first node at each depth is sufficient to construct the whole binary tree The first node at depth j is computed by adding the first two nodes at depth j 1, where m j 1 m n 1 OR gates are required Thus, the OR complexity of computing g T h is th jˆ1 m j 1 m n 1 ˆ m 1 t h t h 1 m n : 41 Moreover, each vector g T i in (8), where 1 i h 1, can be obtained by right shifting the first node at depth t i in the binary tree B Thus, we compute P 1 iˆh gt i using linear tree L and obtain a hybrid tree to compute (8) as shown in Fig 5 From (8), we know that there are only m 1 w i 1 m n nonzero entries in vector g T i Therefore, it requires h 1 m 1 w i 1 m n ˆ h 1 m 1 h iˆ w i m n 4 OR gates to compute the linear tree L Combining (41) and (4) gives the total OR complexity for the computation of (8) as! t h h 1 m 1 h w i t h 1 m n iˆ and the delay is t h 1 T We have known that computing (5) requires m n OR gates with the delay of 1T Therefore, in trinomial cases, if the above hybrid tree structure is employed to compute (4), the total computation complexity of Mastrovito multiplier will be OR complexity: m t h h 1 m 1 h iˆ w i t h m n ; AND complexity: m, Delay: T A t h dlog me T, where t h ˆblog k 1 c, h is the Hamming weight of k 1, and w i is defined in (9) 5 Pentanomial A polynomial f x ˆx m x ks x k1 1 is called pentanomial if s ˆ For general irreducible pentanomials, it's impossible to write the set N in a simple form, eg, as in the trinomial case, and we have to use the general approach presented in Section to design the product matrix module and perform the possible hardware optimization for the dedicated irreducible pentanomial In the following, we present two special irreducible pentanomials for which the set N has a simple form and the complexity of corresponding Mastrovito multipliers can be easily obtained 51 Special Case 1 For irreducible pentanomials where m k, it follows from the analysis in Section 51 that if linear tree structure is used to compute S 1; :, the total complexity of the Mastrovito multiplier is OR complexity: m m 1, AND complexity: m, Delay: T A dlog me T 5 Special Case For such irreducible pentanomials that m k ˆ k k ˆ k k 1 ˆ r;

12 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 45 applying Algorithm 1, we have Nˆ n ˆ 4l r; n ˆ 4l 1 r; 0 l d ; 4 4 where d ˆb m c So, (15) can be simplified as where r S 1; : ˆbd 4 c lˆ0 g T! 4l rš ; 44 g T ˆ A t 1; : A t 1; :! rš: 45 The vector g T is computed using m r 1 OR gates with the delay of 1T If we use linear tree to compute (44) as P 0 lˆb d 4 c gt! 4l rš, then, applying the results in [8], it only requires m 4r 1 OR gates with the delay of b d 4 ct Therefore, the total OR complexity of computing S 1; : is m 5r with the delay of b d 4 c 1 T Furthermore, using Algorithm, we need m 1 OR gates to compute M with the delay of T Thus, in this case, the total complexity of the Mastrovito multiplier is given by OR complexity: m m 1 m 5r, AND complexity: m, Delay: T A b d 4 c 4 dlog me T 54 ESP A polynomial f x ˆ1 x r x r x tr x m, where t 1 r ˆ m, is called an ESP (equally-spaced-polynomial) An ESP with r ˆ 1 is usually referred as an AOP (all-onepolynomial) For irreducible ESP, applying Algorithm 1, we have Nˆf0;rg, based on which it can be shown that matrix U in (5) always has the following form: U ˆ L E 1! rš; where E 1 is defined in () and I rr j L ˆ 4 j j I rr O mm 1 r 5; where O ij and I ii represent i j zero matrix and i i identity matrix, respectively Therefore, the product matrix M can be computed as M ˆ A s U A t ˆ A s L A t E 1! rša t : Let Q 1 denote L A t, we have Q 1 ˆ L A t ˆ P T ; P T ; ; P T T ; 4 where r m matrix P ˆ A t 1:r; : Let Q denote A s E 1! rša t, we have Q ˆ A s E 1! rša t ˆ A s ~ A t! rš; 4 where ~ A t ˆ A T t ;ošt Obviously, Q 1 and Q are obtained without any computation We perform the GF m multiplication as c ˆ M b ˆ Q 1 b Q b: According to (4), we know that only computing P b is sufficient to obtain the result of Q 1 b Since P ˆ A t 1:r; :, we have P i; : ˆA t i; : ˆA t 1; :! i 1 Š; which shows that the first i entries in P i; : are zeros Therefore we get the complexity of computing Q 1 b as #ofand ˆ r #ofor ˆ r m i ˆmr r r ; m i 1 ˆmr r r ; and the delay is T A dlog m 1 et From the definition of A s and A t in (10), we know that each row vector in Q 1:m r; : contains r zero entries and each row vector Q m r i; : contains r i zeros, where 1 i r Therefore, in the computation of Q b, the numbers of AND and OR gates are #ofand ˆ m r m r r ˆ m mr r r #ofor ˆ m r 1 m r r m r i m r i 1 ˆ m mr r r m; and its delay is T A dlog met Obviously, Q 1 b and Q b can be computed in parallel At last, we also need m OR gates to add Q 1 b and Q b together to get the final result c Therefore, we get the total complexity as follows: OR complexity: m r, AND complexity: m, Delay: T A 1 dlog me T Here, we note that the above OR complexity result is identical to that obtained in [14] CONCLUSIONS In this paper, we have presented a systematic design approach for Mastrovito multipliers The complexity results are m AND gates and at most P nn m n 1 m 1 OR gates We note that, although the complexity results appear the same as those presented in [14], we propose an explicit algorithm to compute the set N, which makes our design really practical We have extended this design approach to the modified Mastrovito multiplication scheme, which is suitable for high-hamming weight irreducible polynomials For both original and modified Mastrovito multipliers with general irreducible polynomials, the developed computation approach effectively exploits subexpression sharing and the complexity analyses are given in detail The corresponding hardware architectures for both cases are highly modular Meanwhile, in this paper, we have studied several special irreducible polynomials For trinomials and ESPs, the complexity results

13 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 match the best known results achieved in [8] and [14] We also present another computation approach for trinomials to provide a trade-off between OR complexity and delay Moreover, several other special irreducible polynomials, which also lead to low-complexity implementation, have been discovered and corresponding complexities are given Finally, we note that, with the explicit algorithms and design procedures, all the proposed efficient design schemes can be easily employed by VLSI automation design tools for dedicated bit-parallel GF m multiplier design APPENDI A PROOF of THEOREM 1 In the following, we prove Theorem 1 in two steps, through which Algorithm 1 is developed: 1) First, we will show that the matrix U is equal to the sum of a series of matrices constructed by the following procedure: Procedure A-1 1 Initially, set i ˆ n ˆ 1 and create a matrix multiset WˆfU 1 g We define U 1 ˆ f; o; ; o Š; {z } m 1 where o is the m-dimensional zero column vector; 8 U l W, shift down its ith column by one position to serve as its i 1 th column: U l :;i 1 ˆU l :;i # 1Š: 8 U l W, if the last entry of U l 's ith column vector, U l m; i, is 1, then {Increase n by 1, create U l 's child matrix U n as U n ˆ o; ; o; f; o; ; oš {z } {z } i m i and insert U n into W (U l is called the parent matrix of U n )}; 4 Increase i by 1, if i ˆ m 1, procedure terminates, else return to Step Using the above procedure, we obtain a multiset W containing N matrices U 1 ; ; U N, where N ˆjWj is the order of W Each element matrix is m by m 1 and has the form: h i U i ˆ o; ; o; f; f # 1Š; ; f # m r {z } i Š : {z } r i m 1 r i In the following, we prove by induction that the sum of all U i sinw is identical to the matrix U: U :;l ˆN U i :;l ; 1 l m 1: When i ˆ 1, from Procedure A-1, we have U 1 :; 1 ˆf and U i :; 1 ˆo, 8 i>1 From (4), we also have U :; 1 ˆf So, we get U :; 1 ˆN Assume, for l 1, we have U :;l ˆN U i :; 1 : U i :;l : A:1 Applying the recursive relation between the successive columns of U as shown in (4), we compute U :;l 1 as follows: U :;l 1 ˆU :;l # 1Š U m; l f N N ˆ U i :;l # 1Š ˆ N U i :;l # 1Š N U i m; l f U i m; l f : A: According to Procedure A-1, we know that there always exist two integers 1 r t N and if i r, then matrix U i is created before the lth iteration in the procedure and U i :;l # 1Š ˆU i :;l 1 ; if r<i<t, then U i is created at the lth iteration, U i :;l # 1Š is zero vector, and U i :;l 1 ˆf; if t i, then U i is created after the lth iteration, both U i :;l # 1Š and U i :;l 1 are zero vectors We also know that each U i m; l ˆ1 corresponds to a matrix created at the lth iteration Thus, we have N N U i :;l # 1Š ˆ r U i :;l 1 N U i :;l 1 ; iˆt ˆ t 1 U i m; l f iˆr 1 U i :;l 1 : Substituting the above two equations into (A), we get r U :;l 1 ˆ U i :;l 1 N U i :;l 1 ˆ N t 1 iˆs 1 U i :;l 1 U i :;l 1 : iˆt A: Thus, we have derived (A) from the assumption (A1) Therefore, we can conclude that the sum of all matrices in W is equal to matrix U ) Next, we introduce another method to construct multiset W with the aid of a weighted tree Without loss of generality, let the irreducible polynomial be f x ˆx m x ks x k1 1; where m>k s > >k 1 > 1 Thus, for 0 <j<m, only when j f m k i ; 1 i sg, the last entry of f # j 1 Š is 1 From Procedure A-1, we have

14 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 4 U 1 ˆ h i f; f # 1Š; ; f # m Š and, except U 1, each other matrix U i in W can be constructed by shifting right its parent matrix by w columns, where w f m k i ; 1 i sg Thus, we can construct a weighted tree D in which each node d i represents the matrix U i in W and the weight of each edge represents the number of columns by which the parent matrix right shifts to generate its child matrix According to Procedure A-1, it can be shown that this tree is uniquely determined by the following property: Property A-1 1 Each node d j in D has at most s child nodes and each edge has the weight w f m k i ; 1 i sg Let d 1 denote the root and h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m k i ; 1 i sg and h d 1 ;d j r <m 1, then d j always has a child node d l and the weight of edge between d j and d l is r 8d j D, h d 1 ;d j <m 1 Since U 1 is identical to the matrix F in Theorem 1, we can construct W as WˆfF! hš; 8h Hg; where Hˆfh d 1 ;d i ; 8 d i Dg Obviously, the multiset H may contain some repeated elements, which means W may contain some identical matrices Because here the addition is logic OR, the sum of two identical matrices is actually a zero matrix So, we can remove those repeated elements in pairs from multiset H using the following algorithm: Algorithm A-1 1 Initially, set Nˆ; For 0 j m, do create S j ˆ;, 8 h H,ifhˆj, then insert h into S j If js j j mod ˆ1, then insert j into the set N Using the above algorithm, we construct a new set N f0; 1; ;m g and F! nš ˆ F! hš ˆU: nn hh We note that combining Property A-1 and Algorithm A-1 just produces Algorithm 1 in Section 1 tu APPENDI B PROOF OF THEOREM 4 Similarly to the proof of Theorem 1, we prove Theorem 4 in two steps, through which Algorithm 41 is developed: 1) First, we use the following procedure to construct two matrix multiset W and Z, where the sum of all the matrices in these two multisets is equal to matrix V Procedure B-1 1 Initially, set i ˆ, n 1 ˆ, and n ˆ 1 Create two multisets WˆfW 1 ; W g and ZˆfZ 1 g: W 1 ˆ p; p # 1Š; o; ; o Š; {z } m 1 W ˆ o; p; o; ; o Š; {z } m 1 Z 1 ˆ o; e 1 ; o; ; o Š; {z } m 1 where p represents the coefficient vector of complementary irreducible polynomial p x and e 1 is the first canonical unit vector If W 1 m; 1 ˆ1, then {Increase n 1 by 1, create W 1 's child matrix W n1 ˆ W and insert W n1 into W}; 8 W l Wand 8 Z l Z,do W l :;i 1 ˆW l :;i # 1Š; Z l :;i 1 ˆZ l :;i # 1Š: 4 8 W l W,ifW l m; i 1 ˆ1, then {Increase both n 1 and n by 1, create W l 's child matrix W n1 and as Z n W n1 Z n ˆ o; ; o; p; o; ; oš; {z } {z } i m i ˆ o; ; o; e {z } 1 ; o; ; oš {z } i m i and insert W n1 and Z n into W and Z, respectively}; 5 8 W l W,ifW l m; i ˆ1, then {Increase n 1 by 1, create W l 's child matrix W n1 as W n1 ˆ o; ; o; p; o; ; oš {z } {z } i m i and insert W n1 into W}; Increase i by 1, if i ˆ m 1, procedure terminates, else return to Step Using the above procedure, we get two multisets W and Z Similarly to the proof of Theorem 1, based on the recursive relation of successive column vectors in matrix V as shown in Theorem 41, it can be proven by induction that matrix V is identical to the sum of all matrices in W and Z: V ˆ jwj W i jzj Z j : jˆ1 B:1 ) Next, we introduce another method to construct multiset W and Z with the aid of a weighted tree Since the complementary polynomial is p x ˆx tm s 1 x t1, the last entry of p # j 1 Š or p # j Š is 1 only when j f m t i ; m t i 1 ; 1 i m s 1 g According to Procedure B-1, each child matrix in multiset W can be obtained by right shifting its parent matrix by w f m t i ; m t i 1 ; 1 i m s 1 g columns and the whole construction process begins from two matrices: W 1 and W ˆ W 1! 1Š Therefore, we can construct a weighted tree D in which every node d i represents the matrix W i in W and the weight of each edge represents the number of columns by which the parent matrix right shifts to generate its child matrix According to Procedure B-1, it can be shown that the tree D is uniquely determined by the following property:

15 48 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Property B-1 1 The root d 1, representing matrix W 1, always has one child node d (representing W ) connected by an edge with weight w ˆ 1 Besides d, root d 1 has at most m s 1 other child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g: 8 d j ˆ d 1, it has at most m s 1 child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g: Let h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m t i ; m t i 1 ; 1 i m s 1 g and h d 1 ;d j r <m 1, then d j always has a child node d l and the weight of edge between d j and d l is r 4 8d j D, h d 1 ;d j <m 1 Since W 1 is identical to the matrix P in Theorem 4, we can construct the multiset W as follows: WˆfP! hš; 8h Hg; where Hˆfh d 1 ;d i ; 8 d i Dg Moreover, the above weighted tree D also can be used to represent all the element matrices in multiset Z Let root d 1 represent matrix Z 0 ˆ E 1, each other node d i represents a matrix generated through shifting Z 0 right by h d 1 ;d i columns According to Procedure B-1, if we introduce such a subset of all nodes in D, denoted as ~D, that it contains each node which is connected with its parent node by an edge with the weight z f m t i 1 ; 1 i m s 1 g, we have REFERENCES [1] SA Vanstone and PC Oorschot, An Introduction to Error Correcting Codes with Applications Kluwer, 1989 [] AJ Menezes, Handbook of Applied Cryptography CRC, 199 [] AJ Menezes, Applications of Finite Fields Kluwer Academic, 1994 [4] R Lidl and H Niederreiter, Introduction to Finite Fields and Their Applications Cambridge Univ Press, 1994 [5] D Jungnickel, Finite Fields: Structure and Arithmetics Wissenschaftsverlag, 199 [] ED Mastrovito, ªVLSI Designs for Multiplication over Finite Fields GF m,º Proc Sixth Int'l Conf Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes (AAECC-), pp 9-09, July 1988 [] KK Parhi, VLSI Digital Signal Processing Systems: Design and Implementation John Wiley & Sons, 1999 [8] B Sunar and CË K KocË, ªMastrovito Multiplier for All Trinomials,º IEEE Trans Computers, vol 48, no 5, pp 5-5, May 1999 [9] T Itoh and S Tsujii, ªStructure of Parallel Multipliers for a Class of Finite Fields GF m,º Information and Computations, vol 8, pp 1-40, 1989 [10] MA Hasan, MZ Wang, and VK Bhargava, ªModular Construction of Low Complexity Parallel Multipliers for a Class of Finite Fields GF m,º IEEE Trans Computers, vol 41, no 8, pp 9-91 Aug 199 [11] CË K KocË and B Sunar, ªLow-Complexity Bit-Parallel Canonical and Normal Basis Multipliers for a Class of Finite Fields,º IEEE Trans Computers, vol 4, no, pp 5-5, Mar 1998 [1] H Wu and MA Hasan, ªLow-Complexity Bit-Parallel Multipliers for a Class of Finite Fields,º IEEE Trans Computers, vol 4, no 8, pp 88-88, Aug 1998 [1] A Halbutogullari and CË K KocË, ªMastrovito Multiplier for General Irreducible Polynomials,º Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, pp , 1999 [14] A Halbutogullari and CË K KocË, ªMastrovito Multiplier for General Irreducible Polynomials,º IEEE Trans Computers, vol 49, no 5, pp , May 000 [15] L Song and KK Parhi, ªLow Complexity Modified Mastrovito Multipliers over Finite Fields GF m,º Proc IEEE Int'l Symp Circuits and Systems, vol 1, pp , May 1999 [1] T Zhang and KK Parhi, ªSystematic Design Approach of Mastrovito Multipliers over GF m,º Proc IEEE Workshop Signal Processing Systems (SiPS), pp 50-51, Oct 000 [1] GH Golub and CF Van Loan, Matrix Computations, third ed Johns Hopkins Univ Press, 199 ZˆfE 1! gš; 8g Gg; where Gˆf1;h d 1 ;d i ; 8 d i ~Dg Therefore, (B1) can be rewritten as V ˆ W i Z j : ih jg Similarly to the discussion in Appendix A, using Algorithm A-1, we can remove those repeated elements in pairs from multiset H and G and obtain Lf0; 1; ;m g and Jf1; ;m g, respectively, and V ˆ P! lš E 1! jš: ll jj Summarizing the above approach of constructing L and J, we get Algorithm 41 in Section 41 tu Tong Zhang (S'98) received the BS and MS degrees in electrical engineering from the ian Jiaotong University, ian, China, in 1995 and 1998, respectively He is pursuing the PhD degree in electrical engineering at the University of Minnesota His current research interests include design of VLSI architectures and circuits for digital signal processing and communication systems, with the emphasis on error-correcting coding and finite field arithmetic He is a student member of the IEEE

16 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 49 Keshab K Parhi (S'85-M'88-SM'91-F'9) received the BTech, MSEE, and PhD degrees from the Indian Institute of Technology, Kharagpur (India) (198), the University of Pennsylvania, Philadelphia (1984), and the University of California at Berkeley (1988), respectively He is a distinguished McKnight University Professor of Electrical and Computer Engineering at the University of Minnesota, Minneapolis, where he also holds the Edgar F Johnson Professorship His research interests include all aspects of VLSI implementations of broadband access networks He is currently working on VLSI adaptive digital filters, equalizers and beamformers, error control coders and cryptography architectures, low-power digital systems, and computer arithmetic He has published more than 0 papers in these areas He has authored the text book VLSI Digital Signal Processing Systems (Wiley, 1999) and coedited the reference book Digital Signal Processing for Multimedia Systems (Dekker, 1999) He received the 001 WRG Baker prize paper award from the IEEE, a Golden Jubilee medal from the IEEE Circuits and Systems Society in 1999, a 199 Design Automation Conference best paper award, the 1994 Darlington and the 199 Guillemin-Cauer best paper awards from the IEEE Circuits and Systems Society, the 1991 paper award from the IEEE Signal Processing Society, the 1991 Browder Thompson prize paper award from the IEEE, and the 199 Young Investigator Award of the US National Science Foundation Dr Parhi has served on editorial boards of the IEEE Transactions on Circuits and Systems, the IEEE Transactions on Signal Processing, the IEEE Transactions on Circuits and Sytems-Part II: Analog and Digital Signal Processing, the IEEE Transactions on VLSI Systems, and the IEEE Signal Processing Letters He is an editor of the Journal of VLSI Signal Processing He is the guest editor of a special issue of the IEEE Transactions on Signal Processing, two special issues of the Journal of VLSI Signal Processing and a special issue of the Journal of Analog Integrated Circuits and Signal Processing He served as technical program cochair of the 1995 IEEE Workshop on Signal Processing and the 199 ASAP conference, and is the general chair of the 00 IEEE Workshop on Signal Processing Systems He serves on numerous technical program committees and was a distinguished lecturer of the IEEE Circuits and Systems Society ( ) For further information on this or any computing topic, please visit our Digital Library at

Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials

Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials 1 Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials Yin Li, Xingpo Ma, Yu Zhang and Chuanda Qi Abstract We present a new type of bit-parallel non-recursive Karatsuba multiplier over

More information

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases Arash Reyhani-Masoleh Department of Electrical and Computer Engineering The University of Western Ontario London, Ontario,

More information

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases 1 Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases H. Fan and M. A. Hasan March 31, 2007 Abstract Based on a recently proposed Toeplitz

More information

BSIDES multiplication, squaring is also an important

BSIDES multiplication, squaring is also an important 1 Bit-Parallel GF ( n ) Squarer Using Shifted Polynomial Basis Xi Xiong and Haining Fan Abstract We present explicit formulae and complexities of bit-parallel shifted polynomial basis (SPB) squarers in

More information

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach M A Hasan 1 and C Negre 2 1 ECE Department and CACR, University of Waterloo, Ontario, Canada 2 Team

More information

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

A new class of irreducible pentanomials for polynomial based multipliers in binary fields Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation M A Hasan and C Negre Abstract We study Dickson bases for binary field representation Such representation

More information

PARALLEL MULTIPLICATION IN F 2

PARALLEL MULTIPLICATION IN F 2 PARALLEL MULTIPLICATION IN F 2 n USING CONDENSED MATRIX REPRESENTATION Christophe Negre Équipe DALI, LP2A, Université de Perpignan avenue P Alduy, 66 000 Perpignan, France christophenegre@univ-perpfr Keywords:

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

GF(2 m ) arithmetic: summary

GF(2 m ) arithmetic: summary GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation

More information

High-Speed Polynomial Basis Multipliers Over for Special Pentanomials. José L. Imaña IEEE

High-Speed Polynomial Basis Multipliers Over for Special Pentanomials. José L. Imaña IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 63, NO 1, JANUARY 2016 1 High-Speed Polynomial Basis Multipliers Over for Special Pentanomials José L Imaña Abstract Efficient hardware implementations

More information

Revisiting Finite Field Multiplication Using Dickson Bases

Revisiting Finite Field Multiplication Using Dickson Bases Revisiting Finite Field Multiplication Using Dickson Bases Bijan Ansari and M. Anwar Hasan Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Ontario, Canada {bansari,

More information

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields 1 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan Senior Member, IEEE July 19, 2006 Abstract Based on Toeplitz matrix-vector

More information

IEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography

IEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography IEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography Annex A (Informative). Number-Theoretic Background. Copyright 1999 by the Institute of Electrical and Electronics

More information

IEEE P1363 / D9 (Draft Version 9). Standard Specifications for Public Key Cryptography

IEEE P1363 / D9 (Draft Version 9). Standard Specifications for Public Key Cryptography IEEE P1363 / D9 (Draft Version 9) Standard Specifications for Public Key Cryptography Annex A (informative) Number-Theoretic Background Copyright 1997,1998,1999 by the Institute of Electrical and Electronics

More information

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields 1 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan Senior Member, IEEE January 5, 2006 Abstract Based on Toeplitz matrix-vector

More information

Relation of Pure Minimum Cost Flow Model to Linear Programming

Relation of Pure Minimum Cost Flow Model to Linear Programming Appendix A Page 1 Relation of Pure Minimum Cost Flow Model to Linear Programming The Network Model The network pure minimum cost flow model has m nodes. The external flows given by the vector b with m

More information

Reducing the Complexity of Normal Basis Multiplication

Reducing the Complexity of Normal Basis Multiplication Reducing the Complexity of Normal Basis Multiplication Ömer Eǧecioǧlu and Çetin Kaya Koç Department of Computer Science University of California Santa Barbara {omer,koc}@cs.ucsb.edu Abstract In this paper

More information

Montgomery Multiplier and Squarer in GF(2 m )

Montgomery Multiplier and Squarer in GF(2 m ) Montgomery Multiplier and Squarer in GF( m ) Huapeng Wu The Centre for Applied Cryptographic Research Department of Combinatorics and Optimization University of Waterloo, Waterloo, Canada h3wu@cacrmathuwaterlooca

More information

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Yin Li 1, Gong-liang Chen 2, and Xiao-ning Xie 1 Xinyang local taxation bureau, Henan, China. Email:yunfeiyangli@gmail.com, 2 School

More information

Letting be a field, e.g., of the real numbers, the complex numbers, the rational numbers, the rational functions W(s) of a complex variable s, etc.

Letting be a field, e.g., of the real numbers, the complex numbers, the rational numbers, the rational functions W(s) of a complex variable s, etc. 1 Polynomial Matrices 1.1 Polynomials Letting be a field, e.g., of the real numbers, the complex numbers, the rational numbers, the rational functions W(s) of a complex variable s, etc., n ws ( ) as a

More information

Chapter 7 Network Flow Problems, I

Chapter 7 Network Flow Problems, I Chapter 7 Network Flow Problems, I Network flow problems are the most frequently solved linear programming problems. They include as special cases, the assignment, transportation, maximum flow, and shortest

More information

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ISSUED 24 FEBRUARY 2018 1 Gaussian elimination Let A be an (m n)-matrix Consider the following row operations on A (1) Swap the positions any

More information

TRANSPORTATION PROBLEMS

TRANSPORTATION PROBLEMS Chapter 6 TRANSPORTATION PROBLEMS 61 Transportation Model Transportation models deal with the determination of a minimum-cost plan for transporting a commodity from a number of sources to a number of destinations

More information

A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties:

A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties: Byte multiplication 1 Field arithmetic A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties: F is an abelian group under addition, meaning - F is closed under

More information

On Systems of Diagonal Forms II

On Systems of Diagonal Forms II On Systems of Diagonal Forms II Michael P Knapp 1 Introduction In a recent paper [8], we considered the system F of homogeneous additive forms F 1 (x) = a 11 x k 1 1 + + a 1s x k 1 s F R (x) = a R1 x k

More information

1 Determinants. 1.1 Determinant

1 Determinants. 1.1 Determinant 1 Determinants [SB], Chapter 9, p.188-196. [SB], Chapter 26, p.719-739. Bellow w ll study the central question: which additional conditions must satisfy a quadratic matrix A to be invertible, that is to

More information

DESPITE considerable progress in verification of random

DESPITE considerable progress in verification of random IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1 Formal Analysis of Galois Field Arithmetic Circuits - Parallel Verification and Reverse Engineering Cunxi Yu Student Member,

More information

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

A new class of irreducible pentanomials for polynomial based multipliers in binary fields Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date

More information

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER Indian Journal of Electronics and Electrical Engineering (IJEEE) Vol.2.No.1 2014pp1-6 available at: www.goniv.com Paper Received :05-03-2014 Paper Published:28-03-2014 Paper Reviewed by: 1. John Arhter

More information

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences

More information

Cellular Automata-Based Recursive Pseudoexhaustive Test Pattern Generator

Cellular Automata-Based Recursive Pseudoexhaustive Test Pattern Generator IEEE TRANSACTIONS ON COMPUTERS, VOL, NO, FEBRUARY Cellular Automata-Based Recursive Pseudoexhaustive Test Pattern Generator Prabir Dasgupta, Santanu Chattopadhyay, P Pal Chaudhuri, and Indranil Sengupta

More information

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7, HIGH PERFORMANCE MONTGOMERY MULTIPLICATION USING DADDA TREE ADDITION Thandri Adi Varalakshmi Devi 1, P Subhashini 2 1 PG Scholar, Dept of ECE, Kakinada Institute of Technology, Korangi, AP, India. 2 Assistant

More information

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m ) A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,

More information

TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS. 1. Introduction

TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS. 1. Introduction TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS KEITH CONRAD A (monic) polynomial in Z[T ], 1. Introduction f(t ) = T n + c n 1 T n 1 + + c 1 T + c 0, is Eisenstein at a prime p when each coefficient

More information

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2

More information

Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications

Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2017 Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications

More information

The 4-periodic spiral determinant

The 4-periodic spiral determinant The 4-periodic spiral determinant Darij Grinberg rough draft, October 3, 2018 Contents 001 Acknowledgments 1 1 The determinant 1 2 The proof 4 *** The purpose of this note is to generalize the determinant

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS. 1. Introduction

TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS. 1. Introduction TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS KEITH CONRAD A (monic) polynomial in Z[T ], 1. Introduction f(t ) = T n + c n 1 T n 1 + + c 1 T + c 0, is Eisenstein at a prime p when each coefficient

More information

II. Determinant Functions

II. Determinant Functions Supplemental Materials for EE203001 Students II Determinant Functions Chung-Chin Lu Department of Electrical Engineering National Tsing Hua University May 22, 2003 1 Three Axioms for a Determinant Function

More information

Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

Transition Faults Detection in Bit Parallel Multipliers over GF(2 m )

Transition Faults Detection in Bit Parallel Multipliers over GF(2 m ) Transition Faults Detection in Bit Parallel Multipliers over GF( m ) Hafizur Rahaman Bengal Engineering & Science University, Shibpur Howrah-73, India rahaman_h@it.becs.ac.in Jimson Mathew Computer Science

More information

Matrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via.

Matrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via. Matrix Arithmetic There is an arithmetic for matrices that can be viewed as extending the arithmetic we have developed for vectors to the more general setting of rectangular arrays: if A and B are m n

More information

XOR-counts and lightweight multiplication with fixed elements in binary finite fields

XOR-counts and lightweight multiplication with fixed elements in binary finite fields XOR-counts and lightweight multiplication with fixed elements in binary finite fields Lukas Kölsch University of Rostock, Germany lukaskoelsch@uni-rostockde Abstract XOR-metrics measure the efficiency

More information

RECAP How to find a maximum matching?

RECAP How to find a maximum matching? RECAP How to find a maximum matching? First characterize maximum matchings A maximal matching cannot be enlarged by adding another edge. A maximum matching of G is one of maximum size. Example. Maximum

More information

ALGEBRAIC GEOMETRY I - FINAL PROJECT

ALGEBRAIC GEOMETRY I - FINAL PROJECT ALGEBRAIC GEOMETRY I - FINAL PROJECT ADAM KAYE Abstract This paper begins with a description of the Schubert varieties of a Grassmannian variety Gr(k, n) over C Following the technique of Ryan [3] for

More information

Math 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction

Math 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction Math 4 Summer 01 Elementary Number Theory Notes on Mathematical Induction Principle of Mathematical Induction Recall the following axiom for the set of integers. Well-Ordering Axiom for the Integers If

More information

Binary Convolutional Codes of High Rate Øyvind Ytrehus

Binary Convolutional Codes of High Rate Øyvind Ytrehus Binary Convolutional Codes of High Rate Øyvind Ytrehus Abstract The function N(r; ; d free ), defined as the maximum n such that there exists a binary convolutional code of block length n, dimension n

More information

Polynomial Methods for Component Matching and Verification

Polynomial Methods for Component Matching and Verification Polynomial Methods for Component Matching and Verification James Smith Stanford University Computer Systems Laboratory Stanford, CA 94305 1. Abstract Component reuse requires designers to determine whether

More information

Linear Codes, Target Function Classes, and Network Computing Capacity

Linear Codes, Target Function Classes, and Network Computing Capacity Linear Codes, Target Function Classes, and Network Computing Capacity Rathinakumar Appuswamy, Massimo Franceschetti, Nikhil Karamchandani, and Kenneth Zeger IEEE Transactions on Information Theory Submitted:

More information

REPRESENTATION THEORY OF S n

REPRESENTATION THEORY OF S n REPRESENTATION THEORY OF S n EVAN JENKINS Abstract. These are notes from three lectures given in MATH 26700, Introduction to Representation Theory of Finite Groups, at the University of Chicago in November

More information

High Performance GHASH Function for Long Messages

High Performance GHASH Function for Long Messages High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS

More information

William Stallings Copyright 2010

William Stallings Copyright 2010 A PPENDIX E B ASIC C ONCEPTS FROM L INEAR A LGEBRA William Stallings Copyright 2010 E.1 OPERATIONS ON VECTORS AND MATRICES...2 Arithmetic...2 Determinants...4 Inverse of a Matrix...5 E.2 LINEAR ALGEBRA

More information

PREFACE SEYMOUR LIPSCHUTZ MARC LARS LIPSON. Lipschutz Lipson:Schaum s Outline of Theory and Problems of Linear Algebra, 3/e

PREFACE SEYMOUR LIPSCHUTZ MARC LARS LIPSON. Lipschutz Lipson:Schaum s Outline of Theory and Problems of Linear Algebra, 3/e Algebra, /e Front Matter Preface Companies, 004 PREFACE Linear algebra has in recent years become an essential part of the mathemtical background required by mathematicians and mathematics teachers, engineers,

More information

Detailed Proof of The PerronFrobenius Theorem

Detailed Proof of The PerronFrobenius Theorem Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand

More information

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis 2015 IEEE 22nd Symposium on Computer Arithmetic New Bit-Level Serial GF 2 m ) Multiplication Using Polynomial Basis Hayssam El-Razouk and Arash Reyhani-Masoleh Department of Electrical and Computer Engineering

More information

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2) An Algorithm for Inversion in GF2 m Suitable for Implementation Using a Polynomial Multiply Instruction on GF2 Katsuki Kobayashi, Naofumi Takagi, and Kazuyoshi Takagi Department of Information Engineering,

More information

Gaussian Elimination and Back Substitution

Gaussian Elimination and Back Substitution Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving

More information

Fundamentals of Engineering Analysis (650163)

Fundamentals of Engineering Analysis (650163) Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

CHAPTER 3 Further properties of splines and B-splines

CHAPTER 3 Further properties of splines and B-splines CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

Improving Common Subexpression Elimination Algorithm with A New Gate-Level Delay Computing Method

Improving Common Subexpression Elimination Algorithm with A New Gate-Level Delay Computing Method , 23-25 October, 2013, San Francisco, USA Improving Common Subexpression Elimination Algorithm with A New Gate-Level Dela Computing Method Ning Wu, Xiaoqiang Zhang, Yunfei Ye, and Lidong Lan Abstract In

More information

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS chapter MORE MATRIX ALGEBRA GOALS In Chapter we studied matrix operations and the algebra of sets and logic. We also made note of the strong resemblance of matrix algebra to elementary algebra. The reader

More information

Construction of Equidistributed Generators Based on Linear Recurrences Modulo 2

Construction of Equidistributed Generators Based on Linear Recurrences Modulo 2 Construction of Equidistributed Generators Based on Linear Recurrences Modulo 2 Pierre L Ecuyer and François Panneton Département d informatique et de recherche opérationnelle Université de Montréal C.P.

More information

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1)) The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering

More information

Divisibility of Trinomials by Irreducible Polynomials over F 2

Divisibility of Trinomials by Irreducible Polynomials over F 2 Divisibility of Trinomials by Irreducible Polynomials over F 2 Ryul Kim Faculty of Mathematics and Mechanics Kim Il Sung University, Pyongyang, D.P.R.Korea Wolfram Koepf Department of Mathematics University

More information

Codes over Subfields. Chapter Basics

Codes over Subfields. Chapter Basics Chapter 7 Codes over Subfields In Chapter 6 we looked at various general methods for constructing new codes from old codes. Here we concentrate on two more specialized techniques that result from writing

More information

Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination

Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination Murat Cenk, Anwar Hasan, Christophe Negre To cite this version: Murat Cenk, Anwar Hasan, Christophe Negre.

More information

MULTIPLIERS OF THE TERMS IN THE LOWER CENTRAL SERIES OF THE LIE ALGEBRA OF STRICTLY UPPER TRIANGULAR MATRICES. Louis A. Levy

MULTIPLIERS OF THE TERMS IN THE LOWER CENTRAL SERIES OF THE LIE ALGEBRA OF STRICTLY UPPER TRIANGULAR MATRICES. Louis A. Levy International Electronic Journal of Algebra Volume 1 (01 75-88 MULTIPLIERS OF THE TERMS IN THE LOWER CENTRAL SERIES OF THE LIE ALGEBRA OF STRICTLY UPPER TRIANGULAR MATRICES Louis A. Levy Received: 1 November

More information

Differential Equations and Linear Algebra C. Henry Edwards David E. Penney Third Edition

Differential Equations and Linear Algebra C. Henry Edwards David E. Penney Third Edition Differential Equations and Linear Algebra C. Henry Edwards David E. Penney Third Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

More information

An Õ m 2 n Randomized Algorithm to compute a Minimum Cycle Basis of a Directed Graph

An Õ m 2 n Randomized Algorithm to compute a Minimum Cycle Basis of a Directed Graph An Õ m 2 n Randomized Algorithm to compute a Minimum Cycle Basis of a Directed Graph T Kavitha Indian Institute of Science Bangalore, India kavitha@csaiiscernetin Abstract We consider the problem of computing

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 8 February 1, 2012 CPSC 467b, Lecture 8 1/42 Number Theory Needed for RSA Z n : The integers mod n Modular arithmetic GCD Relatively

More information

The Adjoint of a Linear Operator

The Adjoint of a Linear Operator The Adjoint of a Linear Operator Michael Freeze MAT 531: Linear Algebra UNC Wilmington Spring 2015 1 / 18 Adjoint Operator Let L : V V be a linear operator on an inner product space V. Definition The adjoint

More information

Time-Varying Systems and Computations Lecture 3

Time-Varying Systems and Computations Lecture 3 Time-Varying Systems and Computations Lecture 3 Klaus Diepold November 202 Linear Time-Varying Systems State-Space System Model We aim to derive the matrix containing the time-varying impulse responses

More information

Modular Multiplication in GF (p k ) using Lagrange Representation

Modular Multiplication in GF (p k ) using Lagrange Representation Modular Multiplication in GF (p k ) using Lagrange Representation Jean-Claude Bajard, Laurent Imbert, and Christophe Nègre Laboratoire d Informatique, de Robotique et de Microélectronique de Montpellier

More information

A Numerical Algorithm for Block-Diagonal Decomposition of Matrix -Algebras, Part II: General Algorithm

A Numerical Algorithm for Block-Diagonal Decomposition of Matrix -Algebras, Part II: General Algorithm A Numerical Algorithm for Block-Diagonal Decomposition of Matrix -Algebras, Part II: General Algorithm Takanori Maehara and Kazuo Murota May 2008 / May 2009 Abstract An algorithm is proposed for finding

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND

More information

Optimal XOR based (2,n)-Visual Cryptography Schemes

Optimal XOR based (2,n)-Visual Cryptography Schemes Optimal XOR based (2,n)-Visual Cryptography Schemes Feng Liu and ChuanKun Wu State Key Laboratory Of Information Security, Institute of Software Chinese Academy of Sciences, Beijing 0090, China Email:

More information

Implementation Options for Finite Field Arithmetic for Elliptic Curve Cryptosystems Christof Paar Electrical & Computer Engineering Dept. and Computer Science Dept. Worcester Polytechnic Institute Worcester,

More information

Course MA2C02, Hilary Term 2013 Section 9: Introduction to Number Theory and Cryptography

Course MA2C02, Hilary Term 2013 Section 9: Introduction to Number Theory and Cryptography Course MA2C02, Hilary Term 2013 Section 9: Introduction to Number Theory and Cryptography David R. Wilkins Copyright c David R. Wilkins 2000 2013 Contents 9 Introduction to Number Theory 63 9.1 Subgroups

More information

Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields

Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields ARASH REYHANI-MASOLEH and M. ANWAR HASAN University of Waterloo In this article, two digit-serial architectures for normal basis

More information

10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections )

10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections ) c Dr. Igor Zelenko, Fall 2017 1 10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections 7.2-7.4) 1. When each of the functions F 1, F 2,..., F n in right-hand side

More information

Constructing a Ternary FCSR with a Given Connection Integer

Constructing a Ternary FCSR with a Given Connection Integer Constructing a Ternary FCSR with a Given Connection Integer Lin Zhiqiang 1,2 and Pei Dingyi 1,2 1 School of Mathematics and Information Sciences, Guangzhou University, China 2 State Key Laboratory of Information

More information

CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz

CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES D. Katz The purpose of this note is to present the rational canonical form and Jordan canonical form theorems for my M790 class. Throughout, we fix

More information

Construction of latin squares of prime order

Construction of latin squares of prime order Construction of latin squares of prime order Theorem. If p is prime, then there exist p 1 MOLS of order p. Construction: The elements in the latin square will be the elements of Z p, the integers modulo

More information

Svoboda-Tung Division With No Compensation

Svoboda-Tung Division With No Compensation Svoboda-Tung Division With No Compensation Luis MONTALVO (IEEE Student Member), Alain GUYOT Integrated Systems Design Group, TIMA/INPG 46, Av. Félix Viallet, 38031 Grenoble Cedex, France. E-mail: montalvo@archi.imag.fr

More information

Using q-calculus to study LDL T factorization of a certain Vandermonde matrix

Using q-calculus to study LDL T factorization of a certain Vandermonde matrix Using q-calculus to study LDL T factorization of a certain Vandermonde matrix Alexey Kuznetsov May 2, 207 Abstract We use tools from q-calculus to study LDL T decomposition of the Vandermonde matrix V

More information

arxiv: v1 [math.co] 3 Nov 2014

arxiv: v1 [math.co] 3 Nov 2014 SPARSE MATRICES DESCRIBING ITERATIONS OF INTEGER-VALUED FUNCTIONS BERND C. KELLNER arxiv:1411.0590v1 [math.co] 3 Nov 014 Abstract. We consider iterations of integer-valued functions φ, which have no fixed

More information

Robust Network Codes for Unicast Connections: A Case Study

Robust Network Codes for Unicast Connections: A Case Study Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,

More information

Some constructions of integral graphs

Some constructions of integral graphs Some constructions of integral graphs A. Mohammadian B. Tayfeh-Rezaie School of Mathematics, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran ali m@ipm.ir tayfeh-r@ipm.ir

More information

Introduction to Binary Convolutional Codes [1]

Introduction to Binary Convolutional Codes [1] Introduction to Binary Convolutional Codes [1] Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw Y. S. Han Introduction

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Mathematical Foundations of Cryptography

Mathematical Foundations of Cryptography Mathematical Foundations of Cryptography Cryptography is based on mathematics In this chapter we study finite fields, the basis of the Advanced Encryption Standard (AES) and elliptical curve cryptography

More information

x by so in other words ( )

x by so in other words ( ) Math 154. Norm and trace An interesting application of Galois theory is to help us understand properties of two special constructions associated to field extensions, the norm and trace. If L/k is a finite

More information

APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF

APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF ELEMENTARY LINEAR ALGEBRA WORKBOOK/FOR USE WITH RON LARSON S TEXTBOOK ELEMENTARY LINEAR ALGEBRA CREATED BY SHANNON MARTIN MYERS APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF When you are done

More information

EIGENVALUES AND EIGENVECTORS 3

EIGENVALUES AND EIGENVECTORS 3 EIGENVALUES AND EIGENVECTORS 3 1. Motivation 1.1. Diagonal matrices. Perhaps the simplest type of linear transformations are those whose matrix is diagonal (in some basis). Consider for example the matrices

More information