Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials
|
|
- Gertrude Hudson
- 5 years ago
- Views:
Transcription
1 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials Tong Zhang, Student Member, IEEE, and Keshab K Parhi, Fellow, IEEE AbstractÐThis paper considers the design of bit-parallel dedicated finite field multipliers using standard basis An explicit algorithm is proposed for efficient construction of Mastrovito product matrix, based on which we present a systematic design of Mastrovito multiplier applicable to GF m generated by an arbitrary irreducible polynomial This design effectively exploits the spatial correlation of elements in Mastrovito product matrix to reduce the complexity Using a similar methodology, we propose a systematic design of modified Mastrovito multiplier, which is suitable for GF m generated by high-hamming weight irreducible polynomials For both original and modified Mastrovito multipliers, the developed multiplier architectures are highly modular, which is desirable for VLSI hardware implementation Applying the proposed algorithm and design approach, we study the Mastrovito multipliers for several special irreducible polynomials, such as trinomial and equally-spaced-polynomial, and the obtained complexity results match the best known results Moreover, we have discovered several new special irreducible polynomials which also lead to low-complexity Mastrovito multipliers Index TermsÐFinite (or Galois) field, standard basis, multiplication, irreducible polynomials, complexity, VLSI architecture, Toeplitz matrix æ 1 INTRODUCTION EFFICIENT hardware implementations of finite (or Galois) field GF m arithmetic units are highly desirable for many applications in error-correcting coding and cryptography [1], [], [], [4] Among the GF m arithmetic operations, multiplication is the most basic and important building block in many applications A number of efficient GF m multiplication approaches and architectures have been proposed in which different basis representations of field elements are used, such as standard basis, dual basis, and normal basis Standard basis is more promising in the sense that it gives designers more freedom on irreducible polynomial selection and hardware optimization For detailed discussions of these three basis representations, readers are referred to [], [5] In this paper, we are interested in the design of bit-parallel GF m multipliers using standard basis The standard basis multiplication involves two steps: polynomial multiplication and modulo reduction Let f x be the irreducible polynomial generating GF m, c x be the product of a x and b x, where a x, b x, c x GF m The finite field multiplication is performed as c x ˆ a x b x mod f x An efficient dedicated bitparallel multiplier was proposed by Mastrovito [] in which a product matrix M is introduced to combine the above two steps together Thus, the multiplication is carried out by c ˆ M b, where c and b represent the coefficient The authors are with the Department of Electrical and Computer Engineering, University of Minnesota, 00 Union Street SE, Minneapolis, MN {tzhang, parhi}@eceumnedu Manuscript received Oct 000; revised Feb 001; accepted Apr 001 For information on obtaining reprints of this article, please send to: tc@computerorg, and reference IEEECS Log Number 110 vectors of c x and b x, respectively Given irreducible polynomial f x, each entry in matrix M is obtained by oring certain coefficients of a x Many entries in matrix M can be computed efficiently by sharing some common items, eg, two entries M i 1 ;j 1 ˆa 0 a a and M i ;j ˆa 0 a a 4 can be computed using three OR gates by sharing the common item a 0 a This method is called subexpression sharing [] Mastrovito multipliers using two special irreducible polynomials, trinomial and equallyspaced-polynomial (ESP), have been studied by many researchers for their low-complexity implementations [8], [9], [10], [11], [1], [1], [14] The essence of all these works is to find an architecture to exploit subexpression sharing efficiently based on the specific irreducible polynomials It has been shown in [8] that Mastrovito multiplier using irreducible trinomial x m x n 1 only requires m 1 OR gates and m AND gates By generalizing the approach of [8], Halbutogullari and KocË [14] discovered that the space complexity of Mastrovito multiplier using irreducible ESP x m x tr x r 1, where t 1 r ˆ m, can be reduced to m r OR gates and m AND gates Furthermore, [14] presents a new formulation of the Mastrovito product matrix for an arbitrary irreducible polynomial x m x nk x n1 1, where the space complexity is given as m AND gates and m 1 m k 1 P jn m 1 j OR gates, Nf0; 1; ;m g However, [14] fails to find a method to explicitly compute the set N, which makes its result less practicable in general In general cases, the complexity of computing product matrix is proportional to the Hamming weight of the irreducible polynomial f x (denoted as pwt) So, the Mastrovito multiplier is good only when low-hamming weight irreducible polynomials are used A modified /01/$1000 ß 001 IEEE
2 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 5 Mastrovito multiplier was proposed by Song and Parhi [15], which has a complexity proportional to m 1 pwt Its basic idea is to use the complementary irreducible polynomial for the computation of matrix M with appropriate compensation Such a multiplication scheme is efficient when GF m is generated by high-hamming weight irreducible polynomial Generally, for large m, efficient design for both original and modified Mastrovito multipliers becomes rather difficult and a systematic design approach is highly desirable In this paper, we generalize the approach of [8] in a different way compared with [14] We propose a theorem and an algorithm (which can explicitly compute set N in [14]) for the construction of product matrix in the original Mastrovito multiplier, based on which we develop an efficient systematic design of the original Mastrovito multiplier This design effectively exploits subexpression sharing in the computation of product matrix to reduce the complexity Using a similar methodology, we also develop a systematic design of modified Mastrovito multiplier For both original and modified Mastrovito multipliers, explicit algorithms and architectures are presented and the complexities are given in detail Applying our proposed design approach, we study the irreducible trinomial and ESP and the complexity results match the results in [8] and [14] Meanwhile, another computation approach for trinomial case is proposed to make a trade-off between space complexity and delay Moreover, with the aid of the proposed algorithm, we discover several new irreducible polynomials leading to low-complexity original Mastrovito multipliers, which is especially desirable when neither an irreducible trinomial nor an irreducible ESP exists This paper is organized as follows: We introduce the notation of this paper and the fundamentals of finite field and Mastrovito multiplier in Section In Section, we propose a theorem and an algorithm for the construction of product matrix, based on which a systematic design approach for original Mastrovito multiplier is developed In Section 4, using a similar design methodology, we present a systematic design approach for modified Mastrovito multiplier Efficient computation approaches for several special irreducible polynomials are discussed in Section 5 This paper is an extended version of [1] NOTATION AND PRELIMINARIES Since several arithmetic operations pertaining to matrices and vectors will be extensively used throughout this paper, we first introduce the following notation: Column vectors and matrices are represented by small and capital boldfaced characters, respectively Matlab matrix notations are used, eg, Z i; :, Z :;j, and Z i; j represent the ith row vector, jth column vector, and the entry with position i; j in matrix Z, respectively The operations of shift by feeding zero are represented by corresponding arrows, eg, v # Š, U! 1Š, and U # 1Š represent down shift of vector v by two positions, right shift of matrix U by one column, and down shift of matrix U by one row, respectively, which are explicitly given as: v # Š ˆ 0; 0;v 0 ; ;v n Š T U! 1Š ˆ o; U :; 1 ; ; U :;m 1 Š U # 1Š ˆ o; U 1; : T ; ; U m 1; : T Š T ; where o represents the zero column vector Furthermore, we note that the AND and OR gates considered in this paper are all -input gates, whose delays are denoted as T A and T, respectively Finite field GF m contains m elements and can be viewed as an m-dimensional vector space over GF, which only has two elements, 0 and 1 With the standard basis f1;x;x ; ;x m 1 g, the elements of the finite field GF m can be represented as polynomials of degree m 1 as follows: GF m ˆfa x ja x ˆ a m 1 x m 1 a m x m a 1 x a 0 ;a i GF g: Such polynomial representation is generally used for finite field arithmetic operations, where addition is carried out by polynomial addition over GF m using bit-independent OR operations Finite field multiplication using standard basis is carried out by polynomial multiplication and modulo operation Let f x ˆx m f m 1 x m 1 f 1 x 1 be the irreducible polynomial generating GF m, and c x be the product of a x and b x, where a x ;b x ;c x GF m The polynomial multiplication, d x ˆa x b x, can performed as d ˆ A b; where b ˆ b 0 ;b 1 ; ;b m 1 Š T and d ˆ d 0 ;d 1 ; ;d m Š T are the coefficient vectors of b x and d x, respectively, and matrix A is given as a a 1 a a m a m a 0 0 A ˆ a m 1 a m a 1 a 0 : 1 0 a m 1 a a a m 1 a m a m 1 Next, we perform the modulo reduction c x ˆd x mod f x ; which can be expressed as m c x ˆ d k x k mod f x kˆ0 ˆ m 1 kˆ0 m d k x k d k x k mod f x : kˆm For m k m, if we denote x k mod f x as u l x, where l ˆ k m 1, we have u l x ˆ xm mod f x ; l ˆ 1; u l 1 x xmod f x ; l ˆ ; ; ;m 1:
3 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Since the irreducible polynomial is f x ˆx m f m 1 x m 1 f 1 x f 0 ; where f 0 ˆ 1, we get u 1 j ˆ f j ; 0 j m 1; ( j ˆ u l 1 j 1 u l 1 m 1 f j; 1 j m 1 u l 1 m 1 ; j ˆ 0; u l where u l 1 j and u l j are the coefficients of u l 1 x and u l x, respectively Let u l denote the coefficient vector of u l x, then, using vector notation, () can be rewritten as u l ˆ f; l ˆ 1 u l 1 # 1Š u l 1 4 m 1 f; l m 1; where f ˆ 1;f 1 ; ;f m 1 Š T Let s denote vector d 0 ;d 1 ; ;d m 1 Š T ; which consists of the first m entries of d, we can rewrite () in matrix notation as follows: c ˆ I mm s m 1 d l m 1 u l lˆ1 ˆ I mm ; U d ˆ I mm ; U A b; where I mm represent m m identity matrix and matrix U ˆ u 1 ; u ; ; u m 1 Š Note that all successive columns in matrix U have the recursive relation as in (4) Define M ˆ I mm ; U A: Thus, the GF m multiplication can be carried out by c ˆ M b, which is the well-known Mastrovito multiplication scheme, and the matrix M is called product matrix MASTROVITO MULTIPLIER In this section, we first introduce a theorem and an algorithm pertaining to the construction of product matrix M Then, we develop an approach to compute M by adding a series of Toeplitz matrices, through which subexpression sharing could be extensively exploited to reduce the OR complexity Accordingly, a highly modular multiplier architecture is presented 1 Proposed Theorem and Algorithm In Section, we have shown that product matrix M can be expressed as the product of two independent matrices, I mm ; UŠ and A, and there exists a recursive relation between successive columns in matrix U, as described in (4) Through the following theorem and algorithm, we will see that matrix U also can be constructed by adding a series of Toeplitz matrices Theorem 1 For the computation of matrix U in (5), we can always construct a set Nf0; 1; ;m g and U ˆ F! nš; nn 5 where the Toeplitz matrix F ˆ f; f # 1Š; ; f # m ŠŠ Let the irreducible polynomial be f x ˆx m x k s x k 1 1; where m>k s > >k 1 > 1, the set N in Theorem 1 can be explicitly constructed with the aid of a weighted tree generated based on f x The complete construction approach is described by the following algorithm: Algorithm 1 Input: The parameters of irreducible polynomial: m, k 1 ; ;k s ; Output: set Nf0; 1; ;m g Procedure: 1 Generate a weighted tree D according to the following properties: Each node d j in D has at most s child nodes and each edge has the weight w f m k i ; 1 i sg; Let d 1 denote the root and h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m k i ; 1 i sg and h d 1 ;d j r <m 1, then d j always has one child node d l and the weight of edge between d j and d l is r; 8d j D, h d 1 ;d j <m 1 Construct multiset Hˆfh d 1 ;d j ; 8d j Dg and set Nˆ;; For 0 j m, do a create multiset S j ˆ;; b 8 h H,ifhˆj, then insert h into S j ; c if js j j mod ˆ1, then insert j into the set N Here, we note that a multiset is like a set, except that repeated elements are allowed, and js j j represents the order of S j A proof of Theorem 1 is given in Appendix A, in which Algorithm 1 is developed From the above algorithm, we know that the least two elements in N are always 0 and (m k s ) and we have jn j k s Example 1 Consider the multiplication of a x ˆx 4 x and b x ˆx x 1 over GF 5 with the underlying irreducible polynomial f x ˆx 5 x 4 x x 1 We have f m k i ; 1 i sg ˆf1; ; g Applying Algorithm 1, we generate the tree D as shown in Fig 1, and get the multiset Hˆf0; 1; ; ; ; ; ; g Thus, we have S 0 ˆf0g )js 0 jˆ1; S ˆf; g )js jˆ; S 1 ˆf1g )js 1 jˆ1; S ˆf; ; ; g )js jˆ4: Since js j mod ˆjS j mod ˆ 0, we get Nˆf0; 1g Therefore, we perform the multiplication as
4 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE Define E j ˆ h i e j ; e j # 1Š; ; e j # m Š : We can write the Toeplitz matrix F in Theorem 1 as P s iˆ0 E k i 1, where k 0 ˆ 0, and have Fig 1 Tree structure h c ˆ M b ˆ I 55 ; i F! nš nn A b ˆ ˆ ˆ 0 ; where F ˆ h i f; f # 1Š; ; f # m Š ˆ : Thus, we have c x ˆ a x b x mod f x ˆx 4 Multiplier Architecture Applying Theorem 1 and Algorithm 1, in this section, we develop an efficient computation approach for the product matrix M and present the corresponding lowcomplexity Mastrovito multiplier architecture In the following, we frequently use two special matrices: the Toeplitz matrix and the upper-triangular Toeplitz matrix It is well-known that the sum of two upper-triangular Toeplitz matrices (or one Toeplitz matrix and one upper-triangular Toeplitz matrix) can be obtained by only computing the sum of two row vectors This property will be used to exploit the subexpression sharing in the construction of product matrix M to reduce the entire OR complexity Given a general irreducible polynomial f x ˆx m x ks x k1 1, we express its coefficient vector as f ˆ e 1 e k1 1 e ks 1, where e i is the m-dimensional ith canonical unit vector: e i ˆ 0; ; 0; 1; 0; ; 0 Š T : {z } {z } i 1 m i U ˆ F! nš ˆs nn E ki 1! nš: iˆ0 nn Moreover, according to (1), we write A in block matrix form as A ˆ A T s ; AT t ŠT ; where A s is an m m lower-triangular Toeplitz matrix and A t is an m 1 m upper-triangular Toeplitz matrix: a a 1 a 0 0 A s ˆ 4 5 ; a m 1 a m a 0 0 a m 1 a a a a A t ˆ 4 5 : a m 1 Substituting (), (8), and (9) into (5), we get M ˆ A s s iˆ0 nn 8 9 E ki 1! nša t : 11 Based on the special structure of E j as defined in (), it can be easily proven that E j! nša t ˆ ~At! nš # j 1 Š; 1 where ~ A t ˆ A T t ; ošt and o is an m-dimensional zero column vector Thus, if we denote P nn ~ A t! nš as S, (11) can be rewritten as M ˆ A s s iˆ0 ˆ A s S s nn ~A t! nš # k i Š S # k i Š : 1 Because each A ~ t! nš is an upper-triangular Toeplitz matrix, we easily know that matrix S is also an uppertriangular Toeplitz matrix with the following form: 0 s m 1 s 1 S ˆ s m and computing S 1; : ˆ ~At 1; :! nš ˆ nn nn A t 1; :! nš 15
5 8 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Fig General Mastrovito Multiplier architecture is sufficient to completely construct matrix S Since the first entry of A t 1; : is zero, the first n 1 entries of A t 1; :! nš are also zeroes Recalling that n ˆ 0 is the least element in N, we get the OR complexity for computing S 1; : based on (15) is m n 1 ˆ m n 1 m 1 1 nn & nˆ0 nn and if binary tree structure is used, the delay will be dlog jn jet After having obtained S, we use (1) to compute the product matrix M From (10) and (14), we know that the addition of A s and S does not need any OR gates and the matrix A s S, denoted by T, is a Toeplitz matrix Therefore, (1) can be rewritten as M ˆ T S # k 1 Š S # k s Š: 1 In order to compute (1) efficiently, we introduce the following observation: Observation 1 Given an m m upper-triangular Toeplitz matrix S and m m matrix P whose last m l rows form a Toeplitz submatrix If m>jl, then computing the sum of vector P j 1; : and S 1; : is sufficient to construct the sum of P and S # jš and the last m j rows of the sum matrix still form a Toeplitz submatrix Applying Observation 1, we compute M based on (1) with linear tree structure as follows: Algorithm 1 Initially, set Q 0 ˆ T; For 1 i s, construct Q i ˆ Q i 1 S # k i Š by computing Q i k i 1; : ˆQ i 1 k i 1; : S 1;: ; Finally, set M ˆ Q s In the above algorithm, for each i, Q i 1 k i 1; : S 1;: requires m 1 OR gates, so we need s m 1 OR gates to compute M with the delay of st In order to obtain the result c x via c ˆ M b, we also need m AND gates and m m 1 OR gates to complete this matrix-vector multiplication Its delay will be T A dlog met if binary tree structure is used Based on the above computation approach, we develop the dedicated Mastrovito multiplier architecture as shown in Fig Given irreducible polynomial f x ˆx m x ks x k1 1, set N is constructed using Algorithm 1 Product Matrix Module computes matrix M and consists of two blocks: M1 and M Block M1 generates the vector S 1; : by computing P nn A t 1; :! nš Supplied with S 1; :, block M computes the product matrix M using Algorithm M contains (m 1) P j blocks, each one generates one row vector of M Ifj fk i ; 1 i sg, then P j is identical to block B; otherwise, it is identical to block B1 The Matrix-Vector Multiplication Module computes M b and consists of m identical V blocks The block V computes the inner-product of two vectors of length m The total complexity of the proposed original Mastrovito multiplier for general irreducible polynomial is given by OR Complexity: m s 1 m 1 nn m n 1 ; AND Complexity: m, Delay: T A s dlog jn je dlog me T It needs to be pointed out that, for a given irreducible polynomial, there likely exist some common items in the computation of (15) and (1) By sharing these common items, we may further optimize the hardware architecture of product matrix module to some extent Thus, the multiplier architecture shown in Fig may need some corresponding modifications and the above OR complexity value is actually an upper bound for general cases 4 MODIFIED MASTROVITO MULTIPLIER Using a similar design methodology as in Section, in this section, we develop a systematic design approach for the modified Mastrovito multiplier [15], which is preferable if high-hamming weight irreducible polynomial is used 41 Proposed Theorem and Algorithm For a polynomial f x ˆf m x m f m 1 x m 1 f 1 x f 0, we define its complementary polynomial as p x ˆ 1 f m x m 1 f m 1 x m 1 1 f 1 x 1 f 0 :
6 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 9 According to [15], we have the following theorem to construct matrix U in (5) using the complementary irreducible polynomial: Theorem 41 Given an irreducible polynomial f x ˆx m x ks x k1 1; its complementary polynomial can be written as p x ˆx t m s 1 x t 1 ; where m>t m s 1 > >t 1 > 0 Let p represent the coefficient vector of p x, then we can obtain the matrix U in (5) by adding two matrices V and Q, where V is constructed recursively as V :;i ˆ 8 p; i ˆ 1 >< p # 1Š V m; 1 1 p e 1 ; i ˆ V :;i 1 # 1Š V m; i e 1 >: V m; i V m; i 1 p; i m 1; where e 1 is the first canonical unit vector and Q ˆ w e T 1 V m; :! 1Š ; where w ˆ 1; ; 1 Š T : {z } m In the following, we propose a theorem and an algorithm to construct matrix V in Theorem 41 by adding a series of Toeplitz matrices together Theorem 4 For the computation of matrix V in Theorem 41, we can always find two sets Lf0; 1; ;m g and Jf1; ;m g, and V ˆ P! lš E 1! jš; ll jj where E 1 is defined in () and h i P ˆ p; p # 1Š; ; p # m Š : The sets L and J in Theorem 4 are constructed by the following algorithm: Algorithm 41 Input: The parameters of complementary polynomial: m, t 1 ; ;t m s 1 ; Output: set Lf0; 1; ;m g and Jf1; ;m g Procedure: 1 Generate a weighted tree D according to the following properties: The root d 1 always has one child node d connected by an edge with weight w ˆ 1 Besides d, root d 1 has at most m s 1 other child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g; 8 d j ˆ d 1, it has at most m s 1 child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g; Let h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m t i ; m t i 1 ; 1 i m s 1 g and h d 1 ;d j r <m 1, then d j always has a child node d l and the weight of edge between d j and d l is r; 8d j D, h d 1 ;d j <m 1 Construct a subset, denoted as ~D, of all nodes in D such that it contains each node which is connected with its parent node by an edge with the weight z f m t i 1 ; 1 i m s 1 g; Construct multisets Hˆfh d 1 ;d j ; 8d j Dg and Gˆf1;h d 1 ;d i ; 8 d i ~Dg and sets LˆJ ˆ;; 4 For 0 j m, do a create S j ˆT j ˆ;; b 8 h H,ifh ˆ j, then insert h into S j ; 8 g G,if g ˆ j, then insert g into T j ; c if js j j mod ˆ1, then insert j into the set L; if jt j j mod ˆ1, then insert j into the set J A proof of Theorem 41 is given in Appendix B in which Algorithm 41 is developed Example 41 Consider the construction of matrix U when the high Hamming weight irreducible polynomial f x ˆx x x 5 x x x 1 is being used Its complementary polynomial is p x ˆx 4 We have f m t i ; m t i 1 ; 1 i m s 1 g ˆ f; 4g Applying Algorithm 41, we generate the tree D as shown in Fig From this tree, we get Hˆf0; 1; ; 4; 4; 5g and Gˆf1; 4; 5g Thus, we obtain Lˆf0; 1; ; 5g and Jˆf1; 4; 5g So, matrix V is computed as V ˆ P! lš E 1! jš ll jj ˆ ˆ : Since V ; : ˆ Š, we have V ; :! 1Š ˆ Š and
7 40 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 V A t ˆ P! lš E 1! jš A t ll jj m s 1 ˆ ~A t! lš # t i Š ~A t! jš; jj ll Fig Tree structure Q ˆ w e T 1 V ; :! 1Š ˆ w Š: Finally, we have U ˆ V Q ˆ ˆ : Multiplier Architecture In this section, based on the above theorem and algorithm, we develop an efficient computation approach and corresponding multiplier architecture for the modified Mastrovito multiplication scheme According to (5), (9), and Theorem 41, we have M ˆ I mm ; UŠA ˆ A s U A t ˆ A s V A {z } t Q A t ; 18 {z } M 1 M and the modified Mastrovito multiplication is carried out as c ˆ M b ˆ M 1 b M b: In the following, we develop efficient approaches for computing M 1 and M, respectively Let's begin with M 1 Recall that the complementary polynomial is p x ˆx tm s 1 x t1, thus the matrix P in Theorem 4 can be written as m s 1 P ˆ E ti 1; 19 where E i is defined in () Applying (1) and (19), we have where A ~ t ˆ A T t ; ošt and o is an m-dimensional zero column vector Denote P ~ jj A t! jš and P ~ ll A t! lš as ~S 1 and S ~, respectively, we have m s 1 M 1 ˆ A s ~ S 1 ~S # t i Š: 0 Since each A ~ t! nš is an upper-triangular Toeplitz matrix, we know that both S ~ 1 and S ~ are also upper-triangular Toeplitz matrices, and computing ~S 1 :; 1 ˆ A t 1; :! jš jj ~S :; 1 ˆ ll A t 1; :! lš 1 is sufficient to construct S ~ 1 and S ~ Since the least element in L is always 0, similar to (1), we obtain the OR complexities of computing S ~ 1 and S ~ as m j 1 m min J 1 and jj m l 1 m 1 ll with the delay of dlog jj jet and dlog jljet, respectively Similarly to Algorithm, for the computation of M 1 using (0), we have the following algorithm: Algorithm 4 1 Initially, set Q 0 ˆ A s ~ S 1 ; For 1 i m s 1, construct Q i ˆ Q i 1 ~S # t i Š by computing Q i t i 1; : ˆQ i 1 t i 1; : ~ S 1; : ; Finally, set M 1 ˆ Q m s 1 In the above algorithm, the construction of Toeplitz matrix Q 0 doesn't need any gates For 1 i m s 1, each step needs m 1 OR gates, so the OR complexity and delay of the above algorithm is m s 1 m 1 and m s 1 T, respectively Based on the above discussion, we conclude that the matrix M 1 can be computed with the following procedure: Procedure 41 1 Given a high-hamming weight irreducible polynomial f x ˆx m x ks x k1 1, get its complementary polynomial p x ˆx t m s 1 x t 1 and construct two sets L and J using Algorithm 41; Construct S ~ 1 and S ~ using (1), if we combine L and J together to form a new multiset L, then the complexity of this step is
8 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 41 Fig 4 General Modified Mastrovito Multiplier architecture OR complexity: ll m l 1 m 1 min J ; Delay: dlog DeT, where D ˆ max jlj; jj j ; Then, use Algorithm 4 to compute matrix M 1 For this step, we have OR complexity: m s 1 m 1 ; Delay: m s 1 T Next, let's consider the computation of M Denote the vector e T 1 V m; :! 1Š as qt From Theorem 41, we know that all row vectors in Q are equal to q T Thus, each row vector in M is equal to q T A t Without loss of generality, suppose q T has d nonzero entries whose positions are r d ; ;r 1, respectively, where r d > >r 1, then we have q T A t ˆ d A t r i ; : : Since A t is an upper-triangular Toeplitz matrix, we have A t r i ; : ˆA t 1; :! r i 1 Š, thus () can be rewritten as q T A t ˆ d A t 1; :! r i 1 Š: Therefore, the computation of M only needs P d iˆ m r i OR gates with the delay of dlog det In the above, we have developed the approaches for computing M 1 and M In order to complete the GF m multiplication, we only need to compute c ˆ M b ˆ M 1 b M b: The matrix-vector multiplication M 1 b requires m m 1 OR and m AND gates with the delay of T A dlog met Since all row vectors in M are identical and the first r 1 elements in each row are zeros, the matrix-vector multiplication M b only requires m r 1 1 OR and m r 1 AND gates with the delay of T A dlog m r 1 et The addition of M 1 b and M b needs m OR gates with the delay of T Based on the above computation approach, we develop the modified Mastrovito multiplier architecture as shown in Fig 4 The set L and J are constructed using Algorithm 41 Product Matrix Module computes the two matrices M 1 and M and consists of four blocks: PM1, PM, PM, and PM4 Blocks PM1 and PM generate the vector ~ S 1; : and ~S 1 1; :, respectively Block PM computes the matrix M 1 using Algorithm 4 PM contains (m 1) P j blocks, each one generates one row vector of M If j ft i ; 1 i m s 1g, then P j is identical to block B; otherwise, it is identical to block B1 Block PM4 computes the vector q T A t, the row vector of matrix M, where r 1 ; ;r d represent the position of the nonzero elements in q T The Matrix-Vector Multiplication & Addition Module computes M 1 b M b In this architecture, the operation of M 1 b and M b can be performed in parallel If their delays are denoted by T A m 1 T and T A m T, the total delay of the modified Mastrovito multiplier will be T A 1 max m 1 ;m T Therefore, the total complexity of modified Mastrovito multiplier is OR complexity: m s m 1 ll m l 1 d m r i min J ; AND complexity: m m r 1, Delay: T A 1 max m 1 ;m T, where multiset L is the combination of L and J, d is the Hamming weight of q T, and r i represents the index of nonzero element of q T, and
9 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 m 1 ˆ m s 1 dlog max jlj; jj j e dlog me; m ˆdlog de dlog m r 1 e: We note that, for given high Hamming weight irreducible polynomial, further hardware optimization is possible by sharing common items during computation of M 1 and M So, the above OR complexity result is also an upper bound for general cases 5 SPECIAL IRREDUCIBLE POLYNOMIALS In Section, we presented an efficient computation approach of original Mastrovito multiplication for general cases and pointed out that further simplification can be achieved for specific irreducible polynomials by further exploiting subexpression sharing In this section, we show that by applying our proposed explicit algorithm for the construction of set N, we can easily obtain efficient multiplication schemes with further reduced complexity for several special irreducible polynomials 51 m k s In the following, we show that, for irreducible polynomial f x ˆx m x k s x k 1 1, where m k s, we can reduce the OR complexity of the Mastrovito multiplier by computing S 1; : in (15) using linear tree structure instead of binary tree Since m k s, we easily have Nˆf0;m k s ;m k s 1 ; ;m k 1 g and jn j ˆ s 1 Thus, (15) can be simplified as S 1; : ˆA t 1; : s A t 1; :! m k i Š : 4 Using linear tree structure, we compute S 1; :, according to (4), as follows: Algorithm 51 1 Initially, set v T 0 ˆ A t 1; : ; For 1 i s, compute v T i ˆ v T i 1 A t 1; :! m k i Š; Finally, set S 1; : ˆv T s The OR complexity of the above algorithm is m n 1 m 1 ˆs k i 1 ; nn 5 with the delay of st Compared with using binary tree, the OR complexity doesn't change, but delay increases from dlog s 1 et to st Next, we will prove that, as compensation for the increased delay, the OR complexity of computing M using Algorithm can be reduced from s m 1 to P s m k i by sharing the intermediate results obtained in Algorithm 51 Proof In Algorithm, based on Observation 1, we construct each matrix Q i by only computing Q i k i 1; : ˆQ i 1 k i 1; : S 1; : : Moreover, since the last m k i rows of each matrix Q i form a Toeplitz submatrix, we have Q i 1 k i 1; : ˆQ i 1 k i 1 1; :! k i k i 1 Š: Therefore, based on (), (), and the fact that Q 0 ˆ T ˆ S A s, by induction we have Q i k i 1; : ˆi From (10), we have Thus, jˆ1 S 1; :! k i k j Š S 1; :! k i Š A s k i 1; : : A s k i 1; : ˆ a ki ;a ki 1 ; ;a 1 {z } k i ;a 0 ; 0; ; 0Š; A t 1; : ˆ 0;a m 1 ; ;a ki 1 {z } m k i ;a ki ; ;a 1 Š: 8 A s k i 1; : ˆA t 1; : m k i Š e T k i 1 a 0; 9 where e ki 1 is the k i 1 th canonical unit vector Therefore, (8) becomes Q i k i 1; : ˆi jˆ1 S 1; :! k i k j Š S 1; :! k i Š A t 1; : m k i Š e T k i 1 a 0: 0 Since each Q i k i 1; : is an m-dimensional row vector, the first k i entries of Q i k i 1; : are identical to the last k i entries of Q i k i 1; :! m k i Š From (14), we also know that 8 q m 1, S 1; :! qš is a zero row vector Therefore, according to (0), the last k i entries of Q i k i 1; :! m k i Š are identical to the last k i entries of ~q T i, where ~q T i is defined as ~q T i ˆ i jˆ1 S 1; :! m k j Š A t 1; : : Substituting (4) into (1), we have ~q T i ˆ i jˆ1 s jˆ1 A t 1; :! m k j Š i A t 1; :! m k j m k i Š A t 1; : : 1 Since m k s, we have 8 i; j s, m k i m k j m and A t 1; :! m k j m k i Š is actually a zero row vector So, we can rewrite () as ~q T i ˆ i jˆ1 A t 1; :! m k j Š A t 1; : : We note that ~q T i is identical to v T i, which we have obtained in Algorithm 51 So, the first k i entries of Q i k i 1; : are equal to the last k i entries of the
10 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 4 intermediate result v T i in Algorithm 51 Thus, in Algorithm, for each 1 i s, we only need to compute the last (m k i ) entries of Q i k i 1; :, which requires m k i OR gates, and the total OR complexity of Algorithm is P s m k i with the delay of st tu Therefore, the entire OR complexity of computing product matrix M is s k i 1 s m k i ˆs m 1 with the delay of st Moreover, the matrix-vector multiplication M b requires m m 1 OR and m AND gates with the delay of T A dlog met Thus, for an irreducible polynomial in which m k s, if the linear tree structure is used to compute S 1; :, the entire complexity of Mastrovito multiplier is OR complexity: m s m 1, AND complexity: m, Delay: T A s dlog me T 5 Trinomial If GF m is generated by irreducible trinomial f x ˆx m x n 1, we have Nˆ l m n ; 0 l m : m n Therefore, we get k ˆb m mn c, (15) and (1) can be simplified as S 1; : ˆk iˆ0 Obviously, computing A t 1; :! i m n Š ; 4 M ˆ A s S S # nš: M n 1; : ˆA s n 1; : S n 1; : S 1; : 5 is sufficient to construct M in (5) According to the complexity results presented in Section, we know that the OR complexities of computing (4) and () are P k m i m n 1 and m 1, respectively In the following, we will see that the above complexity values can be further reduced First, we show that m n, instead of m 1, OR gates are sufficient to complete the computation in () From (9), we have A s n 1; : ˆA t 1; : m n Š e T n 1 a 0 and, since S is an upper-triangular Toeplitz matrix, its n 1 th row can be written as S n 1; : ˆS 1; :! nš k ˆ At 1; :! i m n Š! nš iˆ0 ˆ A t 1; :! nš: So, () can be rewritten as M n 1; : ˆA t 1; : m n Š e T n 1 a 0 A t 1; :! nš S 1; : : Let's consider the addition of A t 1; : m n Š and S 1; : in () Because the last m n entries of A t 1; : m n Š are zeros, we only need to compute the first n entries of A t 1; : m n Š S 1; : Obviously, the first n entries of A t 1; : m n Š S 1; : are equal to the last n entries of A t 1; : S 1;:! m n Š and we have A t 1; : S 1;:! m n Š k ˆ A t 1; : A t 1; :! i m n Š! m n Š ˆ k iˆ0 iˆ0 A t 1; :! i m n Š ˆ S 1; : A t 1; :! k 1 m n Š: A t 1; :! k 1 m n Š Since k ˆb m m nc, we have k 1 m n m 1 Thus, the item A t 1; :! k 1 m n Š is actually a zero vector and the first n entries of A t 1; : m n Š S 1; : are identical to the last n entries of S 1; : Therefore, the addition of A t 1; : m n Š and S 1; : does not need any OR gates Furthermore, in (), the sum of the other two items, e T n 1 a 0 A t 1; :! nš ˆ 0; ; 0;a {z } 0 ;a m 1 ; ;a n 1 Š; {z } n m n has m n nonzero entries Thus, we conclude that the computation of () only needs m n OR gates The OR complexity of computing (4) can be reduced by using either the linear tree or hybrid tree method, which will lead to different trade-off between OR complexity and delay 51 Linear Tree If (4) is computed using linear tree structure as P 0 iˆk A t 1; :! i m n Š, we achieve the lowest OR complexity with the delay increasing linearly with k This approach has been thoroughly studied in [8], in which it was proven that n 1 OR gates are sufficient to compute (4) with the delay of kt We have known that the computation of (5) needs m n OR gates with delay of 1T Thus, the total complexity of the Mastrovito multiplier in [8] was obtained as: OR complexity: m 1, AND complexity: m, Delay: T A k 1 dlog me T, k ˆb m m n c Especially, it's pointed out in [8] that if n ˆ m, then the OR complexity can be further reduced from (m 1) to (m m ), with the delay reduced by 1T 5 Hybrid Tree We have known that using linear tree structure to compute (4) will lead to a very low OR complexity with the delay increasing linearly with k However, when k is very large, the delay may be intolerable for applications requiring high speed In the following, we present another approach for computing (4), where the OR complexity also can be
11 44 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Fig 5 Hybrid tree computation scheme reduced by exploiting subexpression sharing and the delay increases linearly with blog k 1 c Let the Hamming weight of k 1 be h and blog k 1 c ˆ t h, then k 1 can always be written as k 1 ˆ t h t h 1 t 1, where t h >t h 1 > >t 1 If we denote A t 1; :! i m n Š as A t 1; : i, (4) can be rewritten as k iˆ0 where A t 1; : i w h 1 ˆ A t 1; : i w h 1 1 A t 1; : i iˆ0 w 1 1 iˆw iˆw h A t 1; : i ˆ g T h gt h 1 gt 1 ; w j ˆ h iˆj 8 t i ; 1 j h: 9 We compute the vector g T h using a binary tree B with the height of t h as shown in Fig 5, where each node represents a m-dimensional vector At each depth j, there are th j nodes with the following values: j 1 j 1 1 t h 1 A t 1; : i ; A t 1; : i ; ; A t 1; : i : 40 iˆ0 iˆ j iˆ t h j In (40), all other items can be directly obtained by right shifting the first item P j 1 iˆ0 A t 1; : i by l j columns, where 1 l th j 1 So, only computing the first node at each depth is sufficient to construct the whole binary tree The first node at depth j is computed by adding the first two nodes at depth j 1, where m j 1 m n 1 OR gates are required Thus, the OR complexity of computing g T h is th jˆ1 m j 1 m n 1 ˆ m 1 t h t h 1 m n : 41 Moreover, each vector g T i in (8), where 1 i h 1, can be obtained by right shifting the first node at depth t i in the binary tree B Thus, we compute P 1 iˆh gt i using linear tree L and obtain a hybrid tree to compute (8) as shown in Fig 5 From (8), we know that there are only m 1 w i 1 m n nonzero entries in vector g T i Therefore, it requires h 1 m 1 w i 1 m n ˆ h 1 m 1 h iˆ w i m n 4 OR gates to compute the linear tree L Combining (41) and (4) gives the total OR complexity for the computation of (8) as! t h h 1 m 1 h w i t h 1 m n iˆ and the delay is t h 1 T We have known that computing (5) requires m n OR gates with the delay of 1T Therefore, in trinomial cases, if the above hybrid tree structure is employed to compute (4), the total computation complexity of Mastrovito multiplier will be OR complexity: m t h h 1 m 1 h iˆ w i t h m n ; AND complexity: m, Delay: T A t h dlog me T, where t h ˆblog k 1 c, h is the Hamming weight of k 1, and w i is defined in (9) 5 Pentanomial A polynomial f x ˆx m x ks x k1 1 is called pentanomial if s ˆ For general irreducible pentanomials, it's impossible to write the set N in a simple form, eg, as in the trinomial case, and we have to use the general approach presented in Section to design the product matrix module and perform the possible hardware optimization for the dedicated irreducible pentanomial In the following, we present two special irreducible pentanomials for which the set N has a simple form and the complexity of corresponding Mastrovito multipliers can be easily obtained 51 Special Case 1 For irreducible pentanomials where m k, it follows from the analysis in Section 51 that if linear tree structure is used to compute S 1; :, the total complexity of the Mastrovito multiplier is OR complexity: m m 1, AND complexity: m, Delay: T A dlog me T 5 Special Case For such irreducible pentanomials that m k ˆ k k ˆ k k 1 ˆ r;
12 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 45 applying Algorithm 1, we have Nˆ n ˆ 4l r; n ˆ 4l 1 r; 0 l d ; 4 4 where d ˆb m c So, (15) can be simplified as where r S 1; : ˆbd 4 c lˆ0 g T! 4l rš ; 44 g T ˆ A t 1; : A t 1; :! rš: 45 The vector g T is computed using m r 1 OR gates with the delay of 1T If we use linear tree to compute (44) as P 0 lˆb d 4 c gt! 4l rš, then, applying the results in [8], it only requires m 4r 1 OR gates with the delay of b d 4 ct Therefore, the total OR complexity of computing S 1; : is m 5r with the delay of b d 4 c 1 T Furthermore, using Algorithm, we need m 1 OR gates to compute M with the delay of T Thus, in this case, the total complexity of the Mastrovito multiplier is given by OR complexity: m m 1 m 5r, AND complexity: m, Delay: T A b d 4 c 4 dlog me T 54 ESP A polynomial f x ˆ1 x r x r x tr x m, where t 1 r ˆ m, is called an ESP (equally-spaced-polynomial) An ESP with r ˆ 1 is usually referred as an AOP (all-onepolynomial) For irreducible ESP, applying Algorithm 1, we have Nˆf0;rg, based on which it can be shown that matrix U in (5) always has the following form: U ˆ L E 1! rš; where E 1 is defined in () and I rr j L ˆ 4 j j I rr O mm 1 r 5; where O ij and I ii represent i j zero matrix and i i identity matrix, respectively Therefore, the product matrix M can be computed as M ˆ A s U A t ˆ A s L A t E 1! rša t : Let Q 1 denote L A t, we have Q 1 ˆ L A t ˆ P T ; P T ; ; P T T ; 4 where r m matrix P ˆ A t 1:r; : Let Q denote A s E 1! rša t, we have Q ˆ A s E 1! rša t ˆ A s ~ A t! rš; 4 where ~ A t ˆ A T t ;ošt Obviously, Q 1 and Q are obtained without any computation We perform the GF m multiplication as c ˆ M b ˆ Q 1 b Q b: According to (4), we know that only computing P b is sufficient to obtain the result of Q 1 b Since P ˆ A t 1:r; :, we have P i; : ˆA t i; : ˆA t 1; :! i 1 Š; which shows that the first i entries in P i; : are zeros Therefore we get the complexity of computing Q 1 b as #ofand ˆ r #ofor ˆ r m i ˆmr r r ; m i 1 ˆmr r r ; and the delay is T A dlog m 1 et From the definition of A s and A t in (10), we know that each row vector in Q 1:m r; : contains r zero entries and each row vector Q m r i; : contains r i zeros, where 1 i r Therefore, in the computation of Q b, the numbers of AND and OR gates are #ofand ˆ m r m r r ˆ m mr r r #ofor ˆ m r 1 m r r m r i m r i 1 ˆ m mr r r m; and its delay is T A dlog met Obviously, Q 1 b and Q b can be computed in parallel At last, we also need m OR gates to add Q 1 b and Q b together to get the final result c Therefore, we get the total complexity as follows: OR complexity: m r, AND complexity: m, Delay: T A 1 dlog me T Here, we note that the above OR complexity result is identical to that obtained in [14] CONCLUSIONS In this paper, we have presented a systematic design approach for Mastrovito multipliers The complexity results are m AND gates and at most P nn m n 1 m 1 OR gates We note that, although the complexity results appear the same as those presented in [14], we propose an explicit algorithm to compute the set N, which makes our design really practical We have extended this design approach to the modified Mastrovito multiplication scheme, which is suitable for high-hamming weight irreducible polynomials For both original and modified Mastrovito multipliers with general irreducible polynomials, the developed computation approach effectively exploits subexpression sharing and the complexity analyses are given in detail The corresponding hardware architectures for both cases are highly modular Meanwhile, in this paper, we have studied several special irreducible polynomials For trinomials and ESPs, the complexity results
13 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 match the best known results achieved in [8] and [14] We also present another computation approach for trinomials to provide a trade-off between OR complexity and delay Moreover, several other special irreducible polynomials, which also lead to low-complexity implementation, have been discovered and corresponding complexities are given Finally, we note that, with the explicit algorithms and design procedures, all the proposed efficient design schemes can be easily employed by VLSI automation design tools for dedicated bit-parallel GF m multiplier design APPENDI A PROOF of THEOREM 1 In the following, we prove Theorem 1 in two steps, through which Algorithm 1 is developed: 1) First, we will show that the matrix U is equal to the sum of a series of matrices constructed by the following procedure: Procedure A-1 1 Initially, set i ˆ n ˆ 1 and create a matrix multiset WˆfU 1 g We define U 1 ˆ f; o; ; o Š; {z } m 1 where o is the m-dimensional zero column vector; 8 U l W, shift down its ith column by one position to serve as its i 1 th column: U l :;i 1 ˆU l :;i # 1Š: 8 U l W, if the last entry of U l 's ith column vector, U l m; i, is 1, then {Increase n by 1, create U l 's child matrix U n as U n ˆ o; ; o; f; o; ; oš {z } {z } i m i and insert U n into W (U l is called the parent matrix of U n )}; 4 Increase i by 1, if i ˆ m 1, procedure terminates, else return to Step Using the above procedure, we obtain a multiset W containing N matrices U 1 ; ; U N, where N ˆjWj is the order of W Each element matrix is m by m 1 and has the form: h i U i ˆ o; ; o; f; f # 1Š; ; f # m r {z } i Š : {z } r i m 1 r i In the following, we prove by induction that the sum of all U i sinw is identical to the matrix U: U :;l ˆN U i :;l ; 1 l m 1: When i ˆ 1, from Procedure A-1, we have U 1 :; 1 ˆf and U i :; 1 ˆo, 8 i>1 From (4), we also have U :; 1 ˆf So, we get U :; 1 ˆN Assume, for l 1, we have U :;l ˆN U i :; 1 : U i :;l : A:1 Applying the recursive relation between the successive columns of U as shown in (4), we compute U :;l 1 as follows: U :;l 1 ˆU :;l # 1Š U m; l f N N ˆ U i :;l # 1Š ˆ N U i :;l # 1Š N U i m; l f U i m; l f : A: According to Procedure A-1, we know that there always exist two integers 1 r t N and if i r, then matrix U i is created before the lth iteration in the procedure and U i :;l # 1Š ˆU i :;l 1 ; if r<i<t, then U i is created at the lth iteration, U i :;l # 1Š is zero vector, and U i :;l 1 ˆf; if t i, then U i is created after the lth iteration, both U i :;l # 1Š and U i :;l 1 are zero vectors We also know that each U i m; l ˆ1 corresponds to a matrix created at the lth iteration Thus, we have N N U i :;l # 1Š ˆ r U i :;l 1 N U i :;l 1 ; iˆt ˆ t 1 U i m; l f iˆr 1 U i :;l 1 : Substituting the above two equations into (A), we get r U :;l 1 ˆ U i :;l 1 N U i :;l 1 ˆ N t 1 iˆs 1 U i :;l 1 U i :;l 1 : iˆt A: Thus, we have derived (A) from the assumption (A1) Therefore, we can conclude that the sum of all matrices in W is equal to matrix U ) Next, we introduce another method to construct multiset W with the aid of a weighted tree Without loss of generality, let the irreducible polynomial be f x ˆx m x ks x k1 1; where m>k s > >k 1 > 1 Thus, for 0 <j<m, only when j f m k i ; 1 i sg, the last entry of f # j 1 Š is 1 From Procedure A-1, we have
14 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 4 U 1 ˆ h i f; f # 1Š; ; f # m Š and, except U 1, each other matrix U i in W can be constructed by shifting right its parent matrix by w columns, where w f m k i ; 1 i sg Thus, we can construct a weighted tree D in which each node d i represents the matrix U i in W and the weight of each edge represents the number of columns by which the parent matrix right shifts to generate its child matrix According to Procedure A-1, it can be shown that this tree is uniquely determined by the following property: Property A-1 1 Each node d j in D has at most s child nodes and each edge has the weight w f m k i ; 1 i sg Let d 1 denote the root and h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m k i ; 1 i sg and h d 1 ;d j r <m 1, then d j always has a child node d l and the weight of edge between d j and d l is r 8d j D, h d 1 ;d j <m 1 Since U 1 is identical to the matrix F in Theorem 1, we can construct W as WˆfF! hš; 8h Hg; where Hˆfh d 1 ;d i ; 8 d i Dg Obviously, the multiset H may contain some repeated elements, which means W may contain some identical matrices Because here the addition is logic OR, the sum of two identical matrices is actually a zero matrix So, we can remove those repeated elements in pairs from multiset H using the following algorithm: Algorithm A-1 1 Initially, set Nˆ; For 0 j m, do create S j ˆ;, 8 h H,ifhˆj, then insert h into S j If js j j mod ˆ1, then insert j into the set N Using the above algorithm, we construct a new set N f0; 1; ;m g and F! nš ˆ F! hš ˆU: nn hh We note that combining Property A-1 and Algorithm A-1 just produces Algorithm 1 in Section 1 tu APPENDI B PROOF OF THEOREM 4 Similarly to the proof of Theorem 1, we prove Theorem 4 in two steps, through which Algorithm 41 is developed: 1) First, we use the following procedure to construct two matrix multiset W and Z, where the sum of all the matrices in these two multisets is equal to matrix V Procedure B-1 1 Initially, set i ˆ, n 1 ˆ, and n ˆ 1 Create two multisets WˆfW 1 ; W g and ZˆfZ 1 g: W 1 ˆ p; p # 1Š; o; ; o Š; {z } m 1 W ˆ o; p; o; ; o Š; {z } m 1 Z 1 ˆ o; e 1 ; o; ; o Š; {z } m 1 where p represents the coefficient vector of complementary irreducible polynomial p x and e 1 is the first canonical unit vector If W 1 m; 1 ˆ1, then {Increase n 1 by 1, create W 1 's child matrix W n1 ˆ W and insert W n1 into W}; 8 W l Wand 8 Z l Z,do W l :;i 1 ˆW l :;i # 1Š; Z l :;i 1 ˆZ l :;i # 1Š: 4 8 W l W,ifW l m; i 1 ˆ1, then {Increase both n 1 and n by 1, create W l 's child matrix W n1 and as Z n W n1 Z n ˆ o; ; o; p; o; ; oš; {z } {z } i m i ˆ o; ; o; e {z } 1 ; o; ; oš {z } i m i and insert W n1 and Z n into W and Z, respectively}; 5 8 W l W,ifW l m; i ˆ1, then {Increase n 1 by 1, create W l 's child matrix W n1 as W n1 ˆ o; ; o; p; o; ; oš {z } {z } i m i and insert W n1 into W}; Increase i by 1, if i ˆ m 1, procedure terminates, else return to Step Using the above procedure, we get two multisets W and Z Similarly to the proof of Theorem 1, based on the recursive relation of successive column vectors in matrix V as shown in Theorem 41, it can be proven by induction that matrix V is identical to the sum of all matrices in W and Z: V ˆ jwj W i jzj Z j : jˆ1 B:1 ) Next, we introduce another method to construct multiset W and Z with the aid of a weighted tree Since the complementary polynomial is p x ˆx tm s 1 x t1, the last entry of p # j 1 Š or p # j Š is 1 only when j f m t i ; m t i 1 ; 1 i m s 1 g According to Procedure B-1, each child matrix in multiset W can be obtained by right shifting its parent matrix by w f m t i ; m t i 1 ; 1 i m s 1 g columns and the whole construction process begins from two matrices: W 1 and W ˆ W 1! 1Š Therefore, we can construct a weighted tree D in which every node d i represents the matrix W i in W and the weight of each edge represents the number of columns by which the parent matrix right shifts to generate its child matrix According to Procedure B-1, it can be shown that the tree D is uniquely determined by the following property:
15 48 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Property B-1 1 The root d 1, representing matrix W 1, always has one child node d (representing W ) connected by an edge with weight w ˆ 1 Besides d, root d 1 has at most m s 1 other child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g: 8 d j ˆ d 1, it has at most m s 1 child nodes, where the weight of edge w f m t i ; m t i 1 ; 1 i m s 1 g: Let h d 1 ;d j denote the weight of path from d 1 to d j, where h d 1 ;d 1 ˆ0, we have 8 d j, if 9 r f m t i ; m t i 1 ; 1 i m s 1 g and h d 1 ;d j r <m 1, then d j always has a child node d l and the weight of edge between d j and d l is r 4 8d j D, h d 1 ;d j <m 1 Since W 1 is identical to the matrix P in Theorem 4, we can construct the multiset W as follows: WˆfP! hš; 8h Hg; where Hˆfh d 1 ;d i ; 8 d i Dg Moreover, the above weighted tree D also can be used to represent all the element matrices in multiset Z Let root d 1 represent matrix Z 0 ˆ E 1, each other node d i represents a matrix generated through shifting Z 0 right by h d 1 ;d i columns According to Procedure B-1, if we introduce such a subset of all nodes in D, denoted as ~D, that it contains each node which is connected with its parent node by an edge with the weight z f m t i 1 ; 1 i m s 1 g, we have REFERENCES [1] SA Vanstone and PC Oorschot, An Introduction to Error Correcting Codes with Applications Kluwer, 1989 [] AJ Menezes, Handbook of Applied Cryptography CRC, 199 [] AJ Menezes, Applications of Finite Fields Kluwer Academic, 1994 [4] R Lidl and H Niederreiter, Introduction to Finite Fields and Their Applications Cambridge Univ Press, 1994 [5] D Jungnickel, Finite Fields: Structure and Arithmetics Wissenschaftsverlag, 199 [] ED Mastrovito, ªVLSI Designs for Multiplication over Finite Fields GF m,º Proc Sixth Int'l Conf Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes (AAECC-), pp 9-09, July 1988 [] KK Parhi, VLSI Digital Signal Processing Systems: Design and Implementation John Wiley & Sons, 1999 [8] B Sunar and CË K KocË, ªMastrovito Multiplier for All Trinomials,º IEEE Trans Computers, vol 48, no 5, pp 5-5, May 1999 [9] T Itoh and S Tsujii, ªStructure of Parallel Multipliers for a Class of Finite Fields GF m,º Information and Computations, vol 8, pp 1-40, 1989 [10] MA Hasan, MZ Wang, and VK Bhargava, ªModular Construction of Low Complexity Parallel Multipliers for a Class of Finite Fields GF m,º IEEE Trans Computers, vol 41, no 8, pp 9-91 Aug 199 [11] CË K KocË and B Sunar, ªLow-Complexity Bit-Parallel Canonical and Normal Basis Multipliers for a Class of Finite Fields,º IEEE Trans Computers, vol 4, no, pp 5-5, Mar 1998 [1] H Wu and MA Hasan, ªLow-Complexity Bit-Parallel Multipliers for a Class of Finite Fields,º IEEE Trans Computers, vol 4, no 8, pp 88-88, Aug 1998 [1] A Halbutogullari and CË K KocË, ªMastrovito Multiplier for General Irreducible Polynomials,º Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, pp , 1999 [14] A Halbutogullari and CË K KocË, ªMastrovito Multiplier for General Irreducible Polynomials,º IEEE Trans Computers, vol 49, no 5, pp , May 000 [15] L Song and KK Parhi, ªLow Complexity Modified Mastrovito Multipliers over Finite Fields GF m,º Proc IEEE Int'l Symp Circuits and Systems, vol 1, pp , May 1999 [1] T Zhang and KK Parhi, ªSystematic Design Approach of Mastrovito Multipliers over GF m,º Proc IEEE Workshop Signal Processing Systems (SiPS), pp 50-51, Oct 000 [1] GH Golub and CF Van Loan, Matrix Computations, third ed Johns Hopkins Univ Press, 199 ZˆfE 1! gš; 8g Gg; where Gˆf1;h d 1 ;d i ; 8 d i ~Dg Therefore, (B1) can be rewritten as V ˆ W i Z j : ih jg Similarly to the discussion in Appendix A, using Algorithm A-1, we can remove those repeated elements in pairs from multiset H and G and obtain Lf0; 1; ;m g and Jf1; ;m g, respectively, and V ˆ P! lš E 1! jš: ll jj Summarizing the above approach of constructing L and J, we get Algorithm 41 in Section 41 tu Tong Zhang (S'98) received the BS and MS degrees in electrical engineering from the ian Jiaotong University, ian, China, in 1995 and 1998, respectively He is pursuing the PhD degree in electrical engineering at the University of Minnesota His current research interests include design of VLSI architectures and circuits for digital signal processing and communication systems, with the emphasis on error-correcting coding and finite field arithmetic He is a student member of the IEEE
16 ZHANG AND PARHI: SYSTEMATIC DESIGN OF ORIGINAL AND MODIFIED MASTROVITO MULTIPLIERS FOR GENERAL IRREDUCIBLE 49 Keshab K Parhi (S'85-M'88-SM'91-F'9) received the BTech, MSEE, and PhD degrees from the Indian Institute of Technology, Kharagpur (India) (198), the University of Pennsylvania, Philadelphia (1984), and the University of California at Berkeley (1988), respectively He is a distinguished McKnight University Professor of Electrical and Computer Engineering at the University of Minnesota, Minneapolis, where he also holds the Edgar F Johnson Professorship His research interests include all aspects of VLSI implementations of broadband access networks He is currently working on VLSI adaptive digital filters, equalizers and beamformers, error control coders and cryptography architectures, low-power digital systems, and computer arithmetic He has published more than 0 papers in these areas He has authored the text book VLSI Digital Signal Processing Systems (Wiley, 1999) and coedited the reference book Digital Signal Processing for Multimedia Systems (Dekker, 1999) He received the 001 WRG Baker prize paper award from the IEEE, a Golden Jubilee medal from the IEEE Circuits and Systems Society in 1999, a 199 Design Automation Conference best paper award, the 1994 Darlington and the 199 Guillemin-Cauer best paper awards from the IEEE Circuits and Systems Society, the 1991 paper award from the IEEE Signal Processing Society, the 1991 Browder Thompson prize paper award from the IEEE, and the 199 Young Investigator Award of the US National Science Foundation Dr Parhi has served on editorial boards of the IEEE Transactions on Circuits and Systems, the IEEE Transactions on Signal Processing, the IEEE Transactions on Circuits and Sytems-Part II: Analog and Digital Signal Processing, the IEEE Transactions on VLSI Systems, and the IEEE Signal Processing Letters He is an editor of the Journal of VLSI Signal Processing He is the guest editor of a special issue of the IEEE Transactions on Signal Processing, two special issues of the Journal of VLSI Signal Processing and a special issue of the Journal of Analog Integrated Circuits and Signal Processing He served as technical program cochair of the 1995 IEEE Workshop on Signal Processing and the 199 ASAP conference, and is the general chair of the 00 IEEE Workshop on Signal Processing Systems He serves on numerous technical program committees and was a distinguished lecturer of the IEEE Circuits and Systems Society ( ) For further information on this or any computing topic, please visit our Digital Library at
Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials
1 Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials Yin Li, Xingpo Ma, Yu Zhang and Chuanda Qi Abstract We present a new type of bit-parallel non-recursive Karatsuba multiplier over
More informationA New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases
A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases Arash Reyhani-Masoleh Department of Electrical and Computer Engineering The University of Western Ontario London, Ontario,
More informationSubquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases
1 Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases H. Fan and M. A. Hasan March 31, 2007 Abstract Based on a recently proposed Toeplitz
More informationBSIDES multiplication, squaring is also an important
1 Bit-Parallel GF ( n ) Squarer Using Shifted Polynomial Basis Xi Xiong and Haining Fan Abstract We present explicit formulae and complexities of bit-parallel shifted polynomial basis (SPB) squarers in
More informationSubquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach
Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach M A Hasan 1 and C Negre 2 1 ECE Department and CACR, University of Waterloo, Ontario, Canada 2 Team
More informationA new class of irreducible pentanomials for polynomial based multipliers in binary fields
Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date
More informationFPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials
FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers
More informationSubquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation
Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation M A Hasan and C Negre Abstract We study Dickson bases for binary field representation Such representation
More informationPARALLEL MULTIPLICATION IN F 2
PARALLEL MULTIPLICATION IN F 2 n USING CONDENSED MATRIX REPRESENTATION Christophe Negre Équipe DALI, LP2A, Université de Perpignan avenue P Alduy, 66 000 Perpignan, France christophenegre@univ-perpfr Keywords:
More informationA COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte
A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic
More informationGF(2 m ) arithmetic: summary
GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation
More informationHigh-Speed Polynomial Basis Multipliers Over for Special Pentanomials. José L. Imaña IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 63, NO 1, JANUARY 2016 1 High-Speed Polynomial Basis Multipliers Over for Special Pentanomials José L Imaña Abstract Efficient hardware implementations
More informationRevisiting Finite Field Multiplication Using Dickson Bases
Revisiting Finite Field Multiplication Using Dickson Bases Bijan Ansari and M. Anwar Hasan Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Ontario, Canada {bansari,
More informationA New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields
1 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan Senior Member, IEEE July 19, 2006 Abstract Based on Toeplitz matrix-vector
More informationIEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography
IEEE P1363 / D13 (Draft Version 13). Standard Specifications for Public Key Cryptography Annex A (Informative). Number-Theoretic Background. Copyright 1999 by the Institute of Electrical and Electronics
More informationIEEE P1363 / D9 (Draft Version 9). Standard Specifications for Public Key Cryptography
IEEE P1363 / D9 (Draft Version 9) Standard Specifications for Public Key Cryptography Annex A (informative) Number-Theoretic Background Copyright 1997,1998,1999 by the Institute of Electrical and Electronics
More informationA New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields
1 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan Senior Member, IEEE January 5, 2006 Abstract Based on Toeplitz matrix-vector
More informationRelation of Pure Minimum Cost Flow Model to Linear Programming
Appendix A Page 1 Relation of Pure Minimum Cost Flow Model to Linear Programming The Network Model The network pure minimum cost flow model has m nodes. The external flows given by the vector b with m
More informationReducing the Complexity of Normal Basis Multiplication
Reducing the Complexity of Normal Basis Multiplication Ömer Eǧecioǧlu and Çetin Kaya Koç Department of Computer Science University of California Santa Barbara {omer,koc}@cs.ucsb.edu Abstract In this paper
More informationMontgomery Multiplier and Squarer in GF(2 m )
Montgomery Multiplier and Squarer in GF( m ) Huapeng Wu The Centre for Applied Cryptographic Research Department of Combinatorics and Optimization University of Waterloo, Waterloo, Canada h3wu@cacrmathuwaterlooca
More informationLow complexity bit-parallel GF (2 m ) multiplier for all-one polynomials
Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Yin Li 1, Gong-liang Chen 2, and Xiao-ning Xie 1 Xinyang local taxation bureau, Henan, China. Email:yunfeiyangli@gmail.com, 2 School
More informationLetting be a field, e.g., of the real numbers, the complex numbers, the rational numbers, the rational functions W(s) of a complex variable s, etc.
1 Polynomial Matrices 1.1 Polynomials Letting be a field, e.g., of the real numbers, the complex numbers, the rational numbers, the rational functions W(s) of a complex variable s, etc., n ws ( ) as a
More informationChapter 7 Network Flow Problems, I
Chapter 7 Network Flow Problems, I Network flow problems are the most frequently solved linear programming problems. They include as special cases, the assignment, transportation, maximum flow, and shortest
More informationANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3
ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ISSUED 24 FEBRUARY 2018 1 Gaussian elimination Let A be an (m n)-matrix Consider the following row operations on A (1) Swap the positions any
More informationTRANSPORTATION PROBLEMS
Chapter 6 TRANSPORTATION PROBLEMS 61 Transportation Model Transportation models deal with the determination of a minimum-cost plan for transporting a commodity from a number of sources to a number of destinations
More informationA field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties:
Byte multiplication 1 Field arithmetic A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties: F is an abelian group under addition, meaning - F is closed under
More informationOn Systems of Diagonal Forms II
On Systems of Diagonal Forms II Michael P Knapp 1 Introduction In a recent paper [8], we considered the system F of homogeneous additive forms F 1 (x) = a 11 x k 1 1 + + a 1s x k 1 s F R (x) = a R1 x k
More information1 Determinants. 1.1 Determinant
1 Determinants [SB], Chapter 9, p.188-196. [SB], Chapter 26, p.719-739. Bellow w ll study the central question: which additional conditions must satisfy a quadratic matrix A to be invertible, that is to
More informationDESPITE considerable progress in verification of random
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1 Formal Analysis of Galois Field Arithmetic Circuits - Parallel Verification and Reverse Engineering Cunxi Yu Student Member,
More informationA new class of irreducible pentanomials for polynomial based multipliers in binary fields
Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date
More informationAN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER
Indian Journal of Electronics and Electrical Engineering (IJEEE) Vol.2.No.1 2014pp1-6 available at: www.goniv.com Paper Received :05-03-2014 Paper Published:28-03-2014 Paper Reviewed by: 1. John Arhter
More informationA Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes
A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences
More informationCellular Automata-Based Recursive Pseudoexhaustive Test Pattern Generator
IEEE TRANSACTIONS ON COMPUTERS, VOL, NO, FEBRUARY Cellular Automata-Based Recursive Pseudoexhaustive Test Pattern Generator Prabir Dasgupta, Santanu Chattopadhyay, P Pal Chaudhuri, and Indranil Sengupta
More informationISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,
HIGH PERFORMANCE MONTGOMERY MULTIPLICATION USING DADDA TREE ADDITION Thandri Adi Varalakshmi Devi 1, P Subhashini 2 1 PG Scholar, Dept of ECE, Kakinada Institute of Technology, Korangi, AP, India. 2 Assistant
More informationA Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )
A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,
More informationTOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS. 1. Introduction
TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS KEITH CONRAD A (monic) polynomial in Z[T ], 1. Introduction f(t ) = T n + c n 1 T n 1 + + c 1 T + c 0, is Eisenstein at a prime p when each coefficient
More informationChapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations
Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2
More informationNovel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications
Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2017 Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications
More informationThe 4-periodic spiral determinant
The 4-periodic spiral determinant Darij Grinberg rough draft, October 3, 2018 Contents 001 Acknowledgments 1 1 The determinant 1 2 The proof 4 *** The purpose of this note is to generalize the determinant
More informationEfficient random number generation on FPGA-s
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation
More informationTOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS. 1. Introduction
TOTALLY RAMIFIED PRIMES AND EISENSTEIN POLYNOMIALS KEITH CONRAD A (monic) polynomial in Z[T ], 1. Introduction f(t ) = T n + c n 1 T n 1 + + c 1 T + c 0, is Eisenstein at a prime p when each coefficient
More informationII. Determinant Functions
Supplemental Materials for EE203001 Students II Determinant Functions Chung-Chin Lu Department of Electrical Engineering National Tsing Hua University May 22, 2003 1 Three Axioms for a Determinant Function
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationTransition Faults Detection in Bit Parallel Multipliers over GF(2 m )
Transition Faults Detection in Bit Parallel Multipliers over GF( m ) Hafizur Rahaman Bengal Engineering & Science University, Shibpur Howrah-73, India rahaman_h@it.becs.ac.in Jimson Mathew Computer Science
More informationMatrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via.
Matrix Arithmetic There is an arithmetic for matrices that can be viewed as extending the arithmetic we have developed for vectors to the more general setting of rectangular arrays: if A and B are m n
More informationXOR-counts and lightweight multiplication with fixed elements in binary finite fields
XOR-counts and lightweight multiplication with fixed elements in binary finite fields Lukas Kölsch University of Rostock, Germany lukaskoelsch@uni-rostockde Abstract XOR-metrics measure the efficiency
More informationRECAP How to find a maximum matching?
RECAP How to find a maximum matching? First characterize maximum matchings A maximal matching cannot be enlarged by adding another edge. A maximum matching of G is one of maximum size. Example. Maximum
More informationALGEBRAIC GEOMETRY I - FINAL PROJECT
ALGEBRAIC GEOMETRY I - FINAL PROJECT ADAM KAYE Abstract This paper begins with a description of the Schubert varieties of a Grassmannian variety Gr(k, n) over C Following the technique of Ryan [3] for
More informationMath 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction
Math 4 Summer 01 Elementary Number Theory Notes on Mathematical Induction Principle of Mathematical Induction Recall the following axiom for the set of integers. Well-Ordering Axiom for the Integers If
More informationBinary Convolutional Codes of High Rate Øyvind Ytrehus
Binary Convolutional Codes of High Rate Øyvind Ytrehus Abstract The function N(r; ; d free ), defined as the maximum n such that there exists a binary convolutional code of block length n, dimension n
More informationPolynomial Methods for Component Matching and Verification
Polynomial Methods for Component Matching and Verification James Smith Stanford University Computer Systems Laboratory Stanford, CA 94305 1. Abstract Component reuse requires designers to determine whether
More informationLinear Codes, Target Function Classes, and Network Computing Capacity
Linear Codes, Target Function Classes, and Network Computing Capacity Rathinakumar Appuswamy, Massimo Franceschetti, Nikhil Karamchandani, and Kenneth Zeger IEEE Transactions on Information Theory Submitted:
More informationREPRESENTATION THEORY OF S n
REPRESENTATION THEORY OF S n EVAN JENKINS Abstract. These are notes from three lectures given in MATH 26700, Introduction to Representation Theory of Finite Groups, at the University of Chicago in November
More informationHigh Performance GHASH Function for Long Messages
High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS
More informationWilliam Stallings Copyright 2010
A PPENDIX E B ASIC C ONCEPTS FROM L INEAR A LGEBRA William Stallings Copyright 2010 E.1 OPERATIONS ON VECTORS AND MATRICES...2 Arithmetic...2 Determinants...4 Inverse of a Matrix...5 E.2 LINEAR ALGEBRA
More informationPREFACE SEYMOUR LIPSCHUTZ MARC LARS LIPSON. Lipschutz Lipson:Schaum s Outline of Theory and Problems of Linear Algebra, 3/e
Algebra, /e Front Matter Preface Companies, 004 PREFACE Linear algebra has in recent years become an essential part of the mathemtical background required by mathematicians and mathematics teachers, engineers,
More informationDetailed Proof of The PerronFrobenius Theorem
Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand
More informationNew Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis
2015 IEEE 22nd Symposium on Computer Arithmetic New Bit-Level Serial GF 2 m ) Multiplication Using Polynomial Basis Hayssam El-Razouk and Arash Reyhani-Masoleh Department of Electrical and Computer Engineering
More informationAn Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)
An Algorithm for Inversion in GF2 m Suitable for Implementation Using a Polynomial Multiply Instruction on GF2 Katsuki Kobayashi, Naofumi Takagi, and Kazuyoshi Takagi Department of Information Engineering,
More informationGaussian Elimination and Back Substitution
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving
More informationFundamentals of Engineering Analysis (650163)
Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is
More informationA VLSI Algorithm for Modular Multiplication/Division
A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp
More informationCHAPTER 3 Further properties of splines and B-splines
CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions
More informationCS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding
CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.
More informationImproving Common Subexpression Elimination Algorithm with A New Gate-Level Delay Computing Method
, 23-25 October, 2013, San Francisco, USA Improving Common Subexpression Elimination Algorithm with A New Gate-Level Dela Computing Method Ning Wu, Xiaoqiang Zhang, Yunfei Ye, and Lidong Lan Abstract In
More informationchapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS
chapter MORE MATRIX ALGEBRA GOALS In Chapter we studied matrix operations and the algebra of sets and logic. We also made note of the strong resemblance of matrix algebra to elementary algebra. The reader
More informationConstruction of Equidistributed Generators Based on Linear Recurrences Modulo 2
Construction of Equidistributed Generators Based on Linear Recurrences Modulo 2 Pierre L Ecuyer and François Panneton Département d informatique et de recherche opérationnelle Université de Montréal C.P.
More informationA Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))
The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering
More informationDivisibility of Trinomials by Irreducible Polynomials over F 2
Divisibility of Trinomials by Irreducible Polynomials over F 2 Ryul Kim Faculty of Mathematics and Mechanics Kim Il Sung University, Pyongyang, D.P.R.Korea Wolfram Koepf Department of Mathematics University
More informationCodes over Subfields. Chapter Basics
Chapter 7 Codes over Subfields In Chapter 6 we looked at various general methods for constructing new codes from old codes. Here we concentrate on two more specialized techniques that result from writing
More informationEfficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination
Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination Murat Cenk, Anwar Hasan, Christophe Negre To cite this version: Murat Cenk, Anwar Hasan, Christophe Negre.
More informationMULTIPLIERS OF THE TERMS IN THE LOWER CENTRAL SERIES OF THE LIE ALGEBRA OF STRICTLY UPPER TRIANGULAR MATRICES. Louis A. Levy
International Electronic Journal of Algebra Volume 1 (01 75-88 MULTIPLIERS OF THE TERMS IN THE LOWER CENTRAL SERIES OF THE LIE ALGEBRA OF STRICTLY UPPER TRIANGULAR MATRICES Louis A. Levy Received: 1 November
More informationDifferential Equations and Linear Algebra C. Henry Edwards David E. Penney Third Edition
Differential Equations and Linear Algebra C. Henry Edwards David E. Penney Third Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world
More informationAn Õ m 2 n Randomized Algorithm to compute a Minimum Cycle Basis of a Directed Graph
An Õ m 2 n Randomized Algorithm to compute a Minimum Cycle Basis of a Directed Graph T Kavitha Indian Institute of Science Bangalore, India kavitha@csaiiscernetin Abstract We consider the problem of computing
More informationCPSC 467b: Cryptography and Computer Security
CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 8 February 1, 2012 CPSC 467b, Lecture 8 1/42 Number Theory Needed for RSA Z n : The integers mod n Modular arithmetic GCD Relatively
More informationThe Adjoint of a Linear Operator
The Adjoint of a Linear Operator Michael Freeze MAT 531: Linear Algebra UNC Wilmington Spring 2015 1 / 18 Adjoint Operator Let L : V V be a linear operator on an inner product space V. Definition The adjoint
More informationTime-Varying Systems and Computations Lecture 3
Time-Varying Systems and Computations Lecture 3 Klaus Diepold November 202 Linear Time-Varying Systems State-Space System Model We aim to derive the matrix containing the time-varying impulse responses
More informationModular Multiplication in GF (p k ) using Lagrange Representation
Modular Multiplication in GF (p k ) using Lagrange Representation Jean-Claude Bajard, Laurent Imbert, and Christophe Nègre Laboratoire d Informatique, de Robotique et de Microélectronique de Montpellier
More informationA Numerical Algorithm for Block-Diagonal Decomposition of Matrix -Algebras, Part II: General Algorithm
A Numerical Algorithm for Block-Diagonal Decomposition of Matrix -Algebras, Part II: General Algorithm Takanori Maehara and Kazuo Murota May 2008 / May 2009 Abstract An algorithm is proposed for finding
More informationSMT 2013 Power Round Solutions February 2, 2013
Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken
More informationVolume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at
Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND
More informationOptimal XOR based (2,n)-Visual Cryptography Schemes
Optimal XOR based (2,n)-Visual Cryptography Schemes Feng Liu and ChuanKun Wu State Key Laboratory Of Information Security, Institute of Software Chinese Academy of Sciences, Beijing 0090, China Email:
More informationImplementation Options for Finite Field Arithmetic for Elliptic Curve Cryptosystems Christof Paar Electrical & Computer Engineering Dept. and Computer Science Dept. Worcester Polytechnic Institute Worcester,
More informationCourse MA2C02, Hilary Term 2013 Section 9: Introduction to Number Theory and Cryptography
Course MA2C02, Hilary Term 2013 Section 9: Introduction to Number Theory and Cryptography David R. Wilkins Copyright c David R. Wilkins 2000 2013 Contents 9 Introduction to Number Theory 63 9.1 Subgroups
More informationEfficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields
Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields ARASH REYHANI-MASOLEH and M. ANWAR HASAN University of Waterloo In this article, two digit-serial architectures for normal basis
More information10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections )
c Dr. Igor Zelenko, Fall 2017 1 10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections 7.2-7.4) 1. When each of the functions F 1, F 2,..., F n in right-hand side
More informationConstructing a Ternary FCSR with a Given Connection Integer
Constructing a Ternary FCSR with a Given Connection Integer Lin Zhiqiang 1,2 and Pei Dingyi 1,2 1 School of Mathematics and Information Sciences, Guangzhou University, China 2 State Key Laboratory of Information
More informationCANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES D. Katz The purpose of this note is to present the rational canonical form and Jordan canonical form theorems for my M790 class. Throughout, we fix
More informationConstruction of latin squares of prime order
Construction of latin squares of prime order Theorem. If p is prime, then there exist p 1 MOLS of order p. Construction: The elements in the latin square will be the elements of Z p, the integers modulo
More informationSvoboda-Tung Division With No Compensation
Svoboda-Tung Division With No Compensation Luis MONTALVO (IEEE Student Member), Alain GUYOT Integrated Systems Design Group, TIMA/INPG 46, Av. Félix Viallet, 38031 Grenoble Cedex, France. E-mail: montalvo@archi.imag.fr
More informationUsing q-calculus to study LDL T factorization of a certain Vandermonde matrix
Using q-calculus to study LDL T factorization of a certain Vandermonde matrix Alexey Kuznetsov May 2, 207 Abstract We use tools from q-calculus to study LDL T decomposition of the Vandermonde matrix V
More informationarxiv: v1 [math.co] 3 Nov 2014
SPARSE MATRICES DESCRIBING ITERATIONS OF INTEGER-VALUED FUNCTIONS BERND C. KELLNER arxiv:1411.0590v1 [math.co] 3 Nov 014 Abstract. We consider iterations of integer-valued functions φ, which have no fixed
More informationRobust Network Codes for Unicast Connections: A Case Study
Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,
More informationSome constructions of integral graphs
Some constructions of integral graphs A. Mohammadian B. Tayfeh-Rezaie School of Mathematics, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran ali m@ipm.ir tayfeh-r@ipm.ir
More informationIntroduction to Binary Convolutional Codes [1]
Introduction to Binary Convolutional Codes [1] Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw Y. S. Han Introduction
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationMathematical Foundations of Cryptography
Mathematical Foundations of Cryptography Cryptography is based on mathematics In this chapter we study finite fields, the basis of the Advanced Encryption Standard (AES) and elliptical curve cryptography
More informationx by so in other words ( )
Math 154. Norm and trace An interesting application of Galois theory is to help us understand properties of two special constructions associated to field extensions, the norm and trace. If L/k is a finite
More informationAPPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF
ELEMENTARY LINEAR ALGEBRA WORKBOOK/FOR USE WITH RON LARSON S TEXTBOOK ELEMENTARY LINEAR ALGEBRA CREATED BY SHANNON MARTIN MYERS APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF When you are done
More informationEIGENVALUES AND EIGENVECTORS 3
EIGENVALUES AND EIGENVECTORS 3 1. Motivation 1.1. Diagonal matrices. Perhaps the simplest type of linear transformations are those whose matrix is diagonal (in some basis). Consider for example the matrices
More information