Fast Matrix Multiplication Over GF3

Size: px
Start display at page:

Download "Fast Matrix Multiplication Over GF3"

Transcription

1 Fast Matrix Multiplication Over GF3 Matthew Lambert Advised by Dr. B. David Saunders & Dr. Stephen Siegel February 13, 2015

2 Matrix Multiplication

3 Outline Fast Matrix Multiplication

4 Outline Fast Matrix Multiplication Strassen s Algorithm: Well studied O(n ) divide-and-conquer algorithm

5 Outline Fast Matrix Multiplication Strassen s Algorithm: Well studied O(n ) divide-and-conquer algorithm Method of the Four Russians (Arlazarov et al., 1970): O( n3 log(n) ) algorithm that is effective over small finite fields (GF2, GF3, GF5). Very well studied for GF2 and anecdotally good for GF3.

6 Outline Fast Matrix Multiplication Strassen s Algorithm: Well studied O(n ) divide-and-conquer algorithm Method of the Four Russians (Arlazarov et al., 1970): O( n3 log(n) ) algorithm that is effective over small finite fields (GF2, GF3, GF5). Very well studied for GF2 and anecdotally good for GF3. Aim to find thresholds for each algorithm over GF3.

7 Arithmetic over GF3 Simply arithmetic mod 3.

8 Arithmetic over GF3 Simply arithmetic mod

9 Arithmetic over GF3 Simply arithmetic mod

10 Representation of GF3 With only three possible values we ideally need two bits for each element.

11 Representation of GF3 With only three possible values we ideally need two bits for each element. Packed storage: elements stored consecutively

12 Representation of GF3 With only three possible values we ideally need two bits for each element. Packed storage: elements stored consecutively To account for possible carry when adding, we actually need 3 bits for each element. 5 bit ops to add two 21-element words.

13 Representation of GF3 With only three possible values we ideally need two bits for each element. Packed storage: elements stored consecutively To account for possible carry when adding, we actually need 3 bits for each element. 5 bit ops to add two 21-element words. Bitsliced storage: low bits and high bits stored separately (Boothby & Bradshaw, 2009). Store 2 as 11 2 instead of

14 Representation of GF3 With only three possible values we ideally need two bits for each element. Packed storage: elements stored consecutively To account for possible carry when adding, we actually need 3 bits for each element. 5 bit ops to add two 21-element words. Bitsliced storage: low bits and high bits stored separately (Boothby & Bradshaw, 2009). Store 2 as 11 2 instead of bit ops to add or subtract two 64-element SlicedUnits. 1 bit op needed to negate.

15 Matrix Multiplication over GF = Row i of A determines row i of C

16 Matrix Multiplication over GF = Row i of A determines row i of C. C i = n j=0 A i,j B j

17 Matrix Multiplication over GF = Row i of A determines row i of C. C i = n j=0 A i,j B j Over GF2, addition is an xor operation

18 Matrix Multiplication over GF = Row i of A determines row i of C. C i = n j=0 A i,j B j Over GF2, addition is an xor operation We perform up to n 2 row additions, yielding O(n 3 ) running time.

19 Matrix Multiplication over GF = Row i of A determines row i of C. C i = n j=0 A i,j B j Over GF2, addition is an xor operation We perform up to n 2 row additions, yielding O(n 3 ) running time. We have a speedup if we do not have to do O(n 2 ) row additions.

20 Four Russians Multiplication over GF Instead of indexing one bit at a time, we index with t = 2 bits at a time, yielding n2 t additions and thus O( n3 t ) running time.

21 Four Russians Multiplication over GF Instead of indexing one bit at a time, we index with t = 2 bits at a time, yielding n2 t additions and thus O( n3 t ) running time. With multiple bits as index, we are adding multiple rows at once.

22 Four Russians Multiplication over GF Instead of indexing one bit at a time, we index with t = 2 bits at a time, yielding n2 t additions and thus O( n3 t ) running time. With multiple bits as index, we are adding multiple rows at once. We need to quickly compute the linear combinations of the t = 2 rows we are adding.

23 Four Russians Multiplication over GF

24 Four Russians Multiplication over GF index description contents r r r 1 + r

25 Four Russians Multiplication over GF index description contents r r r 1 + r

26 Four Russians Multiplication over GF index description contents r r r 3 + r =

27 Four Russians Multiplication over GF2: Fast table creation We perform n2 t row additions using n t tables.

28 Four Russians Multiplication over GF2: Fast table creation We perform n2 t row additions using n t tables. The additions are performed in O( n3 t ) time.

29 Four Russians Multiplication over GF2: Fast table creation We perform n2 t row additions using n t tables. The additions are performed in O( n3 t ) time. The tables can be constructed in O( 2t n 2 t ) time: i.e., one vector addition for each 2 t rows in n t tables.

30 Four Russians Multiplication over GF2: Fast table creation We perform n2 t row additions using n t tables. The additions are performed in O( n3 t ) time. The tables can be constructed in O( 2t n 2 t ) time: i.e., one vector addition for each 2 t rows in n t tables. index description r 1

31 Four Russians Multiplication over GF2: Fast table creation We perform n2 t row additions using n t tables. The additions are performed in O( n3 t ) time. The tables can be constructed in O( 2t n 2 t ) time: i.e., one vector addition for each 2 t rows in n t tables. index description r 1 01 r 2 11 r 1 + r 2

32 Four Russians Multiplication over GF2: Fast table creation We perform n2 t row additions using n t tables. The additions are performed in O( n3 t ) time. The tables can be constructed in O( 2t n 2 t ) time: i.e., one vector addition for each 2 t rows in n t tables. index description r r r 1 + r r r 1 + r r 2 + r r 1 + r 2 + r 3

33 Four Russians Multiplication over GF2: Algorithm Data: A, B, C, n n matrices Result: C A B for i 0 to n/t do T table of 2 t combinations of rows [i t, (i + 1) t) of B for j 0 to n do a bits [i t, (i + 1) t) of row i of A C j C j + T [a] end end

34 Four Russians Multiplication over GF2: Algorithm Data: A, B, C, n n matrices Result: C A B for i 0 to n/t do T table of 2 t combinations of rows [i t, (i + 1) t) of B for j 0 to n do a bits [i t, (i + 1) t) of row i of A C j C j + T [a] end end Running time: n2 t vector adds, totaling O( n3 t ) time; n t tables created, totaling O( 2t n 2 t ) time.

35 Four Russians Multiplication over GF2: Algorithm Data: A, B, C, n n matrices Result: C A B for i 0 to n/t do T table of 2 t combinations of rows [i t, (i + 1) t) of B for j 0 to n do a bits [i t, (i + 1) t) of row i of A C j C j + T [a] end end Running time: n2 t vector adds, totaling O( n3 t ) time; n t tables created, totaling O( 2t n 2 t ) time. n Let t = log 2 (n), then the additions take O( 3 log 2 (n)) time and n the tables take O( 3 log 2 (n)) time to create.

36 Four Russians Multiplication over GF3: Considerations In GF2, we had 2 t row combinations to compute. In GF3, we have to compute 3 t combinations of rows.

37 Four Russians Multiplication over GF3: Considerations In GF2, we had 2 t row combinations to compute. In GF3, we have to compute 3 t combinations of rows. In GF2, extracting bits for indices was easy, with bitsliced representation, this is problematic for GF3.

38 Four Russians Multiplication over GF3: Considerations In GF2, we had 2 t row combinations to compute. In GF3, we have to compute 3 t combinations of rows. In GF2, extracting bits for indices was easy, with bitsliced representation, this is problematic for GF3. How do we use as an index that corresponds to r 2 r 3, which is nominally 21 ( = )?

39 Four Russians Multiplication over GF3: Considerations In GF2, we had 2 t row combinations to compute. In GF3, we have to compute 3 t combinations of rows. In GF2, extracting bits for indices was easy, with bitsliced representation, this is problematic for GF3. How do we use as an index that corresponds to r 2 r 3, which is nominally 21 ( = )? If t = 3, there are only 27 combinations of rows, but both the low bits range between 000 and 111.

40 Four Russians Multiplication over GF3: Considerations In GF2, we had 2 t row combinations to compute. In GF3, we have to compute 3 t combinations of rows. In GF2, extracting bits for indices was easy, with bitsliced representation, this is problematic for GF3. How do we use as an index that corresponds to r 2 r 3, which is nominally 21 ( = )? If t = 3, there are only 27 combinations of rows, but both the low bits range between 000 and 111. Too expensive to map index to range 0 to 3 t directly, so we concatenate the high and low bits to give an index between 0 and 4 t.

41 Four Russians Multiplication over GF3: Considerations In GF2, we had 2 t row combinations to compute. In GF3, we have to compute 3 t combinations of rows. In GF2, extracting bits for indices was easy, with bitsliced representation, this is problematic for GF3. How do we use as an index that corresponds to r 2 r 3, which is nominally 21 ( = )? If t = 3, there are only 27 combinations of rows, but both the low bits range between 000 and 111. Too expensive to map index to range 0 to 3 t directly, so we concatenate the high and low bits to give an index between 0 and 4 t. How do our tables look with these indices?

42 Four Russians Multiplication over GF3: Tables Three approaches considered to address creation of row combinations and indexing problem.

43 Four Russians Multiplication over GF3: Tables Three approaches considered to address creation of row combinations and indexing problem. Use one table of size 2 t rows and add with it twice.

44 Four Russians Multiplication over GF3: Tables Three approaches considered to address creation of row combinations and indexing problem. Use one table of size 2 t rows and add with it twice. Use one table of size 4 t rows and add with it once.

45 Four Russians Multiplication over GF3: Tables Three approaches considered to address creation of row combinations and indexing problem. Use one table of size 2 t rows and add with it twice. Use one table of size 4 t rows and add with it once. Use one table of size 3 t rows and one of size 4 t indices and add with it once.

46 Four Russians Multiplication over GF3: 2 t approach First approach: create a table of 2 t combinations of rows as in GF2 case. Index once into the table with the low t bits and once with the high t bits. The 2 = 11 2 representation used in the bitsliced storage is advantageous.

47 Four Russians Multiplication over GF3: 2 t approach First approach: create a table of 2 t combinations of rows as in GF2 case. Index once into the table with the low t bits and once with the high t bits. The 2 = 11 2 representation used in the bitsliced storage is advantageous. index contents r 1 01 r 2 11 r 1 + r 2

48 Four Russians Multiplication over GF3: 4 t approach Second approach: create a table of size 4 t rows containing all 3 t combinations of rows and some unused rows. We will index into the rows directly.

49 Four Russians Multiplication over GF3: 4 t approach Second approach: create a table of size 4 t rows containing all 3 t combinations of rows and some unused rows. We will index into the rows directly. index contents r r r 1 + r r r 1 + r 2 index contents r r 1 r r 1 r 2

50 Four Russians Multiplication over GF3: 3 t approach Third approach: create a table of 3 t rows and a table of 4 t indices or pointers to map an index to a combination.

51 Four Russians Multiplication over GF3: 3 t approach Third approach: create a table of 3 t rows and a table of 4 t indices or pointers to map an index to a combination. index destination index contents r r r r 1 + r r 1 + r r r 1 r r 1 r

52 Four Russians Multiplication over GF3: Table Summary method memory cost element access adds ops per table 2 t 2 t rows direct t n 3 t 3 t rows and 4 t indirect t n 4 t 4 t rows direct t n

53 Four Russians Multiplication over GF3: Table Summary method memory cost element access adds ops per table 2 t 2 t rows direct t n 3 t 3 t rows and 4 t indirect t n 4 t 4 t rows direct t n Because the 3 t and 4 t approaches contain ± each combination of rows, we only need to use the six operation addition to construct half of the elements. We can then use the one operation negation to construct the other half.

54 Four Russians Multiplication over GF3: Table Summary method memory cost element access adds ops per table 2 t 2 t rows direct t n 3 t 3 t rows and 4 t indirect t n 4 t 4 t rows direct t n Because the 3 t and 4 t approaches contain ± each combination of rows, we only need to use the six operation addition to construct half of the elements. We can then use the one operation negation to construct the other half. Only one supplemental table for the 3 t approach needs to be created for each value of t used.

55 Four Russians Multiplication over GF3: Results Initial development showed 4 t method to be considerably slower, so it was abandoned in favor of optimizing the other two approaches.

56 Four Russians Multiplication over GF3: Results Initial development showed 4 t method to be considerably slower, so it was abandoned in favor of optimizing the other two approaches.

57 Four Russians Multiplication over GF3: Results n 3 t time 2 t time classical time

58 Four Russians Multiplication over GF3: Results n 3 t time 2 t time classical time

59 Four Russians Multiplication over GF3: Conclusions Reasonable speedup over classical: 2.5x (and improving)

60 Four Russians Multiplication over GF3: Conclusions Reasonable speedup over classical: 2.5x (and improving) 3 t approach is not as successful as it theoretically should be.

61 Four Russians Multiplication over GF3: Conclusions Reasonable speedup over classical: 2.5x (and improving) 3 t approach is not as successful as it theoretically should be. Larger tables do not fit into L1 cache as well as 2 t -sized tables?

62 Four Russians Multiplication over GF3: Conclusions Reasonable speedup over classical: 2.5x (and improving) 3 t approach is not as successful as it theoretically should be. Larger tables do not fit into L1 cache as well as 2 t -sized tables? More complicated access increases cost?

63 Four Russians Multiplication over GF3: Conclusions Reasonable speedup over classical: 2.5x (and improving) 3 t approach is not as successful as it theoretically should be. Larger tables do not fit into L1 cache as well as 2 t -sized tables? More complicated access increases cost? Some precedent for multiple additions: M4RI (Albrecht, 2010) increases number of additions if it means smaller tables, so multiple tables can fit into L1 cache.

64 Four Russians Multiplication over GF3: Conclusions Reasonable speedup over classical: 2.5x (and improving) 3 t approach is not as successful as it theoretically should be. Larger tables do not fit into L1 cache as well as 2 t -sized tables? More complicated access increases cost? Some precedent for multiple additions: M4RI (Albrecht, 2010) increases number of additions if it means smaller tables, so multiple tables can fit into L1 cache. Preliminary testing with multiple tables sees 3 t approach compute in seconds, and 2 t approach in 1.01 seconds.

65 Other Four Russians Thoughts We assume we can operate on full 64-bit words. Performance is decreased if we must not modify unused bits.

66 Other Four Russians Thoughts We assume we can operate on full 64-bit words. Performance is decreased if we must not modify unused bits. If t = log(n), then the table of row combinations is the same size as the matrix so there are definite practical limitations.

67 Strassen s Algorithm vs Classical Divide-and-Conquer Classical divide-and-conquer multiplication vs Strassen multiplication on top of 3 t approach of the Method of the Four Russians. Strassen was faster at all tested dimensions.

68 Strassen s Algorithm: 2 t vs 3 t as base case Three levels of recursion performed. Even with base cases where the 3 t approach should outperform 2 t approach, the 2 t base case yielded faster results in all cases (though the multiple tables method outperforms 2 t by about 14%).

69 Strassen s Algorithm: Conclusions and Thresholds Strassen s algorithm is faster than classical and due to memory requirements is at some point faster than the Method of the Four Russians.

70 Strassen s Algorithm: Conclusions and Thresholds Strassen s algorithm is faster than classical and due to memory requirements is at some point faster than the Method of the Four Russians. When using a base case of the 2 t approach, the threshold to switch from Strassen s algorithm to Four Russians on the machine used is somewhere between 960 and 1728.

71 Strassen s Algorithm: Conclusions and Thresholds Strassen s algorithm is faster than classical and due to memory requirements is at some point faster than the Method of the Four Russians. When using a base case of the 2 t approach, the threshold to switch from Strassen s algorithm to Four Russians on the machine used is somewhere between 960 and Normally Strassen s algorithm works best on matrices with dimensions a power of 2 or a power of 2 times the base case dimension. With bitsliced or packed storage it is important to try to maintain matrices with dimensions multiples of 64 at all levels of recursion, as working with less than 1 machine word is more expensive.

72 Strassen s Algorithm: Conclusions and Thresholds Strassen s algorithm is faster than classical and due to memory requirements is at some point faster than the Method of the Four Russians. When using a base case of the 2 t approach, the threshold to switch from Strassen s algorithm to Four Russians on the machine used is somewhere between 960 and Normally Strassen s algorithm works best on matrices with dimensions a power of 2 or a power of 2 times the base case dimension. With bitsliced or packed storage it is important to try to maintain matrices with dimensions multiples of 64 at all levels of recursion, as working with less than 1 machine word is more expensive. Further improvements ongoing?

73 Summary Successfully developed and implemented the Method of the Four Russians over GF3 yielding noticeable performance over classical multiplication.

74 Summary Successfully developed and implemented the Method of the Four Russians over GF3 yielding noticeable performance over classical multiplication. Implemented Strassen s algorithm on top of the Method of the Four Russians.

75 Summary Successfully developed and implemented the Method of the Four Russians over GF3 yielding noticeable performance over classical multiplication. Implemented Strassen s algorithm on top of the Method of the Four Russians. Different implementations of the Method of the Four Russians might be preferred depending on specific problem and machine.

76 Best value of t For GF2 we can assume all operations are essentially equal in time: only different cost is extracting bits. All other costs are 64-bit xors. For (m n) (n k) = (m k), we spend mnk/t time adding and nk2 t /t time creating tables. Minimize mnk t + k2t t. Partial w.r.t. t: kn(2t (t log(2) 1) m). t 2 Set equal to zero and solve for t = W (m/e)+1 log(2). dimension t log 2 (dim)

77 Selecting t in GF3 For 3 t approach: we spend mnk/t time adding and nk3 t /t time creating tables. t = W (m/e)+1 log(3). For 2 t approach: we spend 2mnk/t time adding and nk2 t /t time creating tables. t = W (2m/e)+1 log(2) dimension t min t experimental t log 3 (dim) dimension t min t experimental t log 2 (dim)

78 GF3 bitsliced operations s x 0 y 1 t x 1 y 0 r 0 (x 0 y 1 ) (x 1 y 0 ) r 1 s t Figure: GF3 bitsliced addition in six operations: r x + y t x 0 y 0 r 0 t (x 1 y 1 ) r 1 (t y 1 ) (y 0 x 1 ) Figure: GF3 bitsliced subtraction in six operations: r x y r 0 x 0 r 1 x 0 x 1 Figure: GF3 bitsliced negation in one operation: r x

Matrix Multiplication

Matrix Multiplication Matrix Multiplication Matrix Multiplication Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. n c ij = a ik b kj k=1 c 11 c 12 c 1n c 21 c 22 c 2n c n1 c n2 c nn = a 11 a 12 a 1n

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR 0716498 Part I. Linear maps Consider computing 0

More information

Matrices. Chapter Definitions and Notations

Matrices. Chapter Definitions and Notations Chapter 3 Matrices 3. Definitions and Notations Matrices are yet another mathematical object. Learning about matrices means learning what they are, how they are represented, the types of operations which

More information

Eliminations and echelon forms in exact linear algebra

Eliminations and echelon forms in exact linear algebra Eliminations and echelon forms in exact linear algebra Clément PERNET, INRIA-MOAIS, Grenoble Université, France East Coast Computer Algebra Day, University of Waterloo, ON, Canada, April 9, 2011 Clément

More information

Integer factorization, part 1: the Q sieve. part 2: detecting smoothness. D. J. Bernstein

Integer factorization, part 1: the Q sieve. part 2: detecting smoothness. D. J. Bernstein Integer factorization, part 1: the Q sieve Integer factorization, part 2: detecting smoothness D. J. Bernstein The Q sieve factors by combining enough -smooth congruences ( + ). Enough log. Plausible conjecture:

More information

Calculating Algebraic Signatures Thomas Schwarz, S.J.

Calculating Algebraic Signatures Thomas Schwarz, S.J. Calculating Algebraic Signatures Thomas Schwarz, S.J. 1 Introduction A signature is a small string calculated from a large object. The primary use of signatures is the identification of objects: equal

More information

Fast reversion of power series

Fast reversion of power series Fast reversion of power series Fredrik Johansson November 2011 Overview Fast power series arithmetic Fast composition and reversion (Brent and Kung, 1978) A new algorithm for reversion Implementation results

More information

Introduction to Digital Logic Missouri S&T University CPE 2210 Subtractors

Introduction to Digital Logic Missouri S&T University CPE 2210 Subtractors Introduction to Digital Logic Missouri S&T University CPE 2210 Egemen K. Çetinkaya Egemen K. Çetinkaya Department of Electrical & Computer Engineering Missouri University of Science and Technology cetinkayae@mst.edu

More information

Chapter 2. Divide-and-conquer. 2.1 Strassen s algorithm

Chapter 2. Divide-and-conquer. 2.1 Strassen s algorithm Chapter 2 Divide-and-conquer This chapter revisits the divide-and-conquer paradigms and explains how to solve recurrences, in particular, with the use of the master theorem. We first illustrate the concept

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

MCR3U Unit 7 Lesson Notes

MCR3U Unit 7 Lesson Notes 7.1 Arithmetic Sequences Sequence: An ordered list of numbers identified by a pattern or rule that may stop at some number or continue indefinitely. Ex. 1, 2, 4, 8,... Ex. 3, 7, 11, 15 Term (of a sequence):

More information

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2 CMPSCI611: Three Divide-and-Conquer Examples Lecture 2 Last lecture we presented and analyzed Mergesort, a simple divide-and-conquer algorithm. We then stated and proved the Master Theorem, which gives

More information

Lecture 8: Number theory

Lecture 8: Number theory KTH - Royal Institute of Technology NADA, course: 2D1458 Problem solving and programming under pressure Autumn 2005 for Fredrik Niemelä Authors: Johnne Adermark and Jenny Melander, 9th Nov 2005 Lecture

More information

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n Class Note #14 Date: 03/01/2006 [Overall Information] In this class, we studied an algorithm for integer multiplication, which improved the running time from θ(n 2 ) to θ(n 1.59 ). We then used some of

More information

GF(2 m ) arithmetic: summary

GF(2 m ) arithmetic: summary GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation

More information

CSE 20 DISCRETE MATH. Fall

CSE 20 DISCRETE MATH. Fall CSE 20 DISCRETE MATH Fall 2017 http://cseweb.ucsd.edu/classes/fa17/cse20-ab/ Today's learning goals Describe and use algorithms for integer operations based on their expansions Relate algorithms for integer

More information

An Arithmetic Sequence can be defined recursively as. a 1 is the first term and d is the common difference where a 1 and d are real numbers.

An Arithmetic Sequence can be defined recursively as. a 1 is the first term and d is the common difference where a 1 and d are real numbers. Section 12 2A: Arithmetic Sequences An arithmetic sequence is a sequence that has a constant ( labeled d ) added to the first term to get the second term and that same constant is then added to the second

More information

Yale university technical report #1402.

Yale university technical report #1402. The Mailman algorithm: a note on matrix vector multiplication Yale university technical report #1402. Edo Liberty Computer Science Yale University New Haven, CT Steven W. Zucker Computer Science and Appled

More information

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all

More information

Algorithms and Data Structures Strassen s Algorithm. ADS (2017/18) Lecture 4 slide 1

Algorithms and Data Structures Strassen s Algorithm. ADS (2017/18) Lecture 4 slide 1 Algorithms and Data Structures Strassen s Algorithm ADS (2017/18) Lecture 4 slide 1 Tutorials Start in week (week 3) Tutorial allocations are linked from the course webpage http://www.inf.ed.ac.uk/teaching/courses/ads/

More information

Prime Fields 04/05/2007. Hybrid system simulator for ODE 1. Galois field. The issue. Prime fields: naïve implementation

Prime Fields 04/05/2007. Hybrid system simulator for ODE 1. Galois field. The issue. Prime fields: naïve implementation Galois field The issue Topic: finite fields with word size cardinality Field: 4 arithmetic operators to implement (+, -, *, /) We will focus on axpy: r = a x + y (operation mainly used in linear algebra

More information

A block cipher enciphers each block with the same key.

A block cipher enciphers each block with the same key. Ciphers are classified as block or stream ciphers. All ciphers split long messages into blocks and encipher each block separately. Block sizes range from one bit to thousands of bits per block. A block

More information

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation M A Hasan and C Negre Abstract We study Dickson bases for binary field representation Such representation

More information

Complexity Theory of Polynomial-Time Problems

Complexity Theory of Polynomial-Time Problems Complexity Theory of Polynomial-Time Problems Lecture 1: Introduction, Easy Examples Karl Bringmann and Sebastian Krinninger Audience no formal requirements, but: NP-hardness, satisfiability problem, how

More information

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem CS 577 Introduction to Algorithms: Jin-Yi Cai University of Wisconsin Madison In the last class, we described InsertionSort and showed that its worst-case running time is Θ(n 2 ). Check Figure 2.2 for

More information

Consider the following example of a linear system:

Consider the following example of a linear system: LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x + 2x 2 + 3x 3 = 5 x + x 3 = 3 3x + x 2 + 3x 3 = 3 x =, x 2 = 0, x 3 = 2 In general we want to solve n equations

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences

More information

Algorithmic Algebraic Techniques and their Application to Block Cipher Cryptanalysis

Algorithmic Algebraic Techniques and their Application to Block Cipher Cryptanalysis Algorithmic Algebraic Techniques and their Application to Block Cipher Cryptanalysis Martin Albrecht M.R.Albrecht@rhul.ac.uk Thesis submitted to Royal Holloway, University of London for the degree of Doctor

More information

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm CSCI 1760 - Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm Shay Mozes Brown University shay@cs.brown.edu Abstract. This report describes parallel Java implementations of

More information

Discrete Logarithm Problem

Discrete Logarithm Problem Discrete Logarithm Problem Çetin Kaya Koç koc@cs.ucsb.edu (http://cs.ucsb.edu/~koc/ecc) Elliptic Curve Cryptography lect08 discrete log 1 / 46 Exponentiation and Logarithms in a General Group In a multiplicative

More information

Polynomial multiplication and division using heap.

Polynomial multiplication and division using heap. Polynomial multiplication and division using heap. Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University. Abstract We report on new code for sparse multivariate polynomial

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

arxiv: v1 [cs.sc] 17 Apr 2013

arxiv: v1 [cs.sc] 17 Apr 2013 EFFICIENT CALCULATION OF DETERMINANTS OF SYMBOLIC MATRICES WITH MANY VARIABLES TANYA KHOVANOVA 1 AND ZIV SCULLY 2 arxiv:1304.4691v1 [cs.sc] 17 Apr 2013 Abstract. Efficient matrix determinant calculations

More information

Tutorials. Algorithms and Data Structures Strassen s Algorithm. The Master Theorem for solving recurrences. The Master Theorem (cont d)

Tutorials. Algorithms and Data Structures Strassen s Algorithm. The Master Theorem for solving recurrences. The Master Theorem (cont d) DS 2018/19 Lecture 4 slide 3 DS 2018/19 Lecture 4 slide 4 Tutorials lgorithms and Data Structures Strassen s lgorithm Start in week week 3 Tutorial allocations are linked from the course webpage http://www.inf.ed.ac.uk/teaching/courses/ads/

More information

Efficient Enumeration of Regular Languages

Efficient Enumeration of Regular Languages Efficient Enumeration of Regular Languages Margareta Ackerman and Jeffrey Shallit University of Waterloo, Waterloo ON, Canada mackerma@uwaterloo.ca, shallit@graceland.uwaterloo.ca Abstract. The cross-section

More information

Direct Construction of Recursive MDS Diffusion Layers using Shortened BCH Codes. Daniel Augot and Matthieu Finiasz

Direct Construction of Recursive MDS Diffusion Layers using Shortened BCH Codes. Daniel Augot and Matthieu Finiasz Direct Construction of Recursive MDS Diffusion Layers using Shortened BCH Codes Daniel Augot and Matthieu Finiasz Context Diffusion layers in a block cipher/spn should: obviously, offer good diffusion,

More information

A Linear Time Algorithm for Ordered Partition

A Linear Time Algorithm for Ordered Partition A Linear Time Algorithm for Ordered Partition Yijie Han School of Computing and Engineering University of Missouri at Kansas City Kansas City, Missouri 64 hanyij@umkc.edu Abstract. We present a deterministic

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

! Break up problem into several parts. ! Solve each part recursively. ! Combine solutions to sub-problems into overall solution.

! Break up problem into several parts. ! Solve each part recursively. ! Combine solutions to sub-problems into overall solution. Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer.! Break up problem into several parts.! Solve each part recursively.! Combine solutions to sub-problems into overall solution. Most common

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively. Combine solutions to sub-problems into overall solution. Most common

More information

Speedy Maths. David McQuillan

Speedy Maths. David McQuillan Speedy Maths David McQuillan Basic Arithmetic What one needs to be able to do Addition and Subtraction Multiplication and Division Comparison For a number of order 2 n n ~ 100 is general multi precision

More information

The Master Theorem for solving recurrences. Algorithms and Data Structures Strassen s Algorithm. Tutorials. The Master Theorem (cont d)

The Master Theorem for solving recurrences. Algorithms and Data Structures Strassen s Algorithm. Tutorials. The Master Theorem (cont d) The Master Theorem for solving recurrences lgorithms and Data Structures Strassen s lgorithm 23rd September, 2014 Theorem Let n 0 N, k N 0 and a, b R with a > 0 and b > 1, and let T : N R satisfy the following

More information

Three Ways to Test Irreducibility

Three Ways to Test Irreducibility Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 12 Feb 2009 Outline Polynomials over finite fields Irreducibility criteria

More information

Random Number Generation. Stephen Booth David Henty

Random Number Generation. Stephen Booth David Henty Random Number Generation Stephen Booth David Henty Introduction Random numbers are frequently used in many types of computer simulation Frequently as part of a sampling process: Generate a representative

More information

Three Ways to Test Irreducibility

Three Ways to Test Irreducibility Outline Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 8 Dec 2008 Polynomials over finite fields Irreducibility criteria

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Square Always Exponentiation

Square Always Exponentiation Square Always Exponentiation Christophe Clavier 1 Benoit Feix 1,2 Georges Gagnerot 1,2 Mylène Roussellet 2 Vincent Verneuil 2,3 1 XLIM-Université de Limoges, France 2 INSIDE Secure, Aix-en-Provence, France

More information

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro

More information

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Study Homework: Arithmetic NTU IC54CA (Fall 2004) SOLUTIONS Short adders A The delay of the ripple

More information

Parallelism and Machine Models

Parallelism and Machine Models Parallelism and Machine Models Andrew D Smith University of New Brunswick, Fredericton Faculty of Computer Science Overview Part 1: The Parallel Computation Thesis Part 2: Parallelism of Arithmetic RAMs

More information

Lecture 19: The Determinant

Lecture 19: The Determinant Math 108a Professor: Padraic Bartlett Lecture 19: The Determinant Week 10 UCSB 2013 In our last class, we talked about how to calculate volume in n-dimensions Specifically, we defined a parallelotope:

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

Reductions, Recursion and Divide and Conquer

Reductions, Recursion and Divide and Conquer Chapter 5 Reductions, Recursion and Divide and Conquer CS 473: Fundamental Algorithms, Fall 2011 September 13, 2011 5.1 Reductions and Recursion 5.1.0.1 Reduction Reducing problem A to problem B: (A) Algorithm

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

CS227-Scientific Computing. Lecture 4: A Crash Course in Linear Algebra

CS227-Scientific Computing. Lecture 4: A Crash Course in Linear Algebra CS227-Scientific Computing Lecture 4: A Crash Course in Linear Algebra Linear Transformation of Variables A common phenomenon: Two sets of quantities linearly related: y = 3x + x 2 4x 3 y 2 = 2.7x 2 x

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS We want to solve the linear system a, x + + a,n x n = b a n, x + + a n,n x n = b n This will be done by the method used in beginning algebra, by successively eliminating unknowns

More information

Multiplicative Complexity Reductions in Cryptography and Cryptanalysis

Multiplicative Complexity Reductions in Cryptography and Cryptanalysis Multiplicative Complexity Reductions in Cryptography and Cryptanalysis THEODOSIS MOUROUZIS SECURITY OF SYMMETRIC CIPHERS IN NETWORK PROTOCOLS - ICMS - EDINBURGH 25-29 MAY/2015 1 Presentation Overview Linearity

More information

Sparse Polynomial Multiplication and Division in Maple 14

Sparse Polynomial Multiplication and Division in Maple 14 Sparse Polynomial Multiplication and Division in Maple 4 Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University Burnaby B.C. V5A S6, Canada October 5, 9 Abstract We report

More information

Cache-Oblivious Algorithms

Cache-Oblivious Algorithms Cache-Oblivious Algorithms 1 Cache-Oblivious Model 2 The Unknown Machine Algorithm C program gcc Object code linux Execution Can be executed on machines with a specific class of CPUs Algorithm Java program

More information

PARALLEL MULTIPLICATION IN F 2

PARALLEL MULTIPLICATION IN F 2 PARALLEL MULTIPLICATION IN F 2 n USING CONDENSED MATRIX REPRESENTATION Christophe Negre Équipe DALI, LP2A, Université de Perpignan avenue P Alduy, 66 000 Perpignan, France christophenegre@univ-perpfr Keywords:

More information

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu Analysis of Algorithm Efficiency Dr. Yingwu Zhu Measure Algorithm Efficiency Time efficiency How fast the algorithm runs; amount of time required to accomplish the task Our focus! Space efficiency Amount

More information

William Stallings Copyright 2010

William Stallings Copyright 2010 A PPENDIX E B ASIC C ONCEPTS FROM L INEAR A LGEBRA William Stallings Copyright 2010 E.1 OPERATIONS ON VECTORS AND MATRICES...2 Arithmetic...2 Determinants...4 Inverse of a Matrix...5 E.2 LINEAR ALGEBRA

More information

Mathematics for Engineers. Numerical mathematics

Mathematics for Engineers. Numerical mathematics Mathematics for Engineers Numerical mathematics Integers Determine the largest representable integer with the intmax command. intmax ans = int32 2147483647 2147483647+1 ans = 2.1475e+09 Remark The set

More information

What s the best data structure for multivariate polynomials in a world of 64 bit multicore computers?

What s the best data structure for multivariate polynomials in a world of 64 bit multicore computers? What s the best data structure for multivariate polynomials in a world of 64 bit multicore computers? Michael Monagan Center for Experimental and Constructive Mathematics Simon Fraser University British

More information

Cache-Oblivious Algorithms

Cache-Oblivious Algorithms Cache-Oblivious Algorithms 1 Cache-Oblivious Model 2 The Unknown Machine Algorithm C program gcc Object code linux Execution Can be executed on machines with a specific class of CPUs Algorithm Java program

More information

Toward High Performance Matrix Multiplication for Exact Computation

Toward High Performance Matrix Multiplication for Exact Computation Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations

More information

A Review of Matrix Analysis

A Review of Matrix Analysis Matrix Notation Part Matrix Operations Matrices are simply rectangular arrays of quantities Each quantity in the array is called an element of the matrix and an element can be either a numerical value

More information

Fields in Cryptography. Çetin Kaya Koç Winter / 30

Fields in Cryptography.   Çetin Kaya Koç Winter / 30 Fields in Cryptography http://koclab.org Çetin Kaya Koç Winter 2017 1 / 30 Field Axioms Fields in Cryptography A field F consists of a set S and two operations which we will call addition and multiplication,

More information

Image Compression. 1. Introduction. Greg Ames Dec 07, 2002

Image Compression. 1. Introduction. Greg Ames Dec 07, 2002 Image Compression Greg Ames Dec 07, 2002 Abstract Digital images require large amounts of memory to store and, when retrieved from the internet, can take a considerable amount of time to download. The

More information

A Divide-and-Conquer Algorithm for Functions of Triangular Matrices

A Divide-and-Conquer Algorithm for Functions of Triangular Matrices A Divide-and-Conquer Algorithm for Functions of Triangular Matrices Ç. K. Koç Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report, June 1996 Abstract We propose

More information

Mod 2 linear algebra and tabulation of rational eigenforms

Mod 2 linear algebra and tabulation of rational eigenforms Mod 2 linear algebra and tabulation of rational eigenforms Kiran S. Kedlaya Department of Mathematics, University of California, San Diego kedlaya@ucsd.edu http://kskedlaya.org/slides/ (see also this SageMathCloud

More information

SEQUENCES AND SERIES

SEQUENCES AND SERIES A sequence is an ordered list of numbers. SEQUENCES AND SERIES Note, in this context, ordered does not mean that the numbers in the list are increasing or decreasing. Instead it means that there is a first

More information

A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties:

A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties: Byte multiplication 1 Field arithmetic A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties: F is an abelian group under addition, meaning - F is closed under

More information

PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) Linxiao Wang. Graduate Program in Computer Science

PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) Linxiao Wang. Graduate Program in Computer Science PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) by Linxiao Wang Graduate Program in Computer Science A thesis submitted in partial fulfillment of the requirements

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given

More information

McBits: Fast code-based cryptography

McBits: Fast code-based cryptography McBits: Fast code-based cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands Joint work with Daniel Bernstein, Tung Chou December 17, 2013 IMA International Conference on Cryptography

More information

1 The Algebraic Normal Form

1 The Algebraic Normal Form 1 The Algebraic Normal Form Boolean maps can be expressed by polynomials this is the algebraic normal form (ANF). The degree as a polynomial is a first obvious measure of nonlinearity linear (or affine)

More information

Introduction to Algorithms 6.046J/18.401J/SMA5503

Introduction to Algorithms 6.046J/18.401J/SMA5503 Introduction to Algorithms 6.046J/8.40J/SMA5503 Lecture 3 Prof. Piotr Indyk The divide-and-conquer design paradigm. Divide the problem (instance) into subproblems. 2. Conquer the subproblems by solving

More information

On queueing in coded networks queue size follows degrees of freedom

On queueing in coded networks queue size follows degrees of freedom On queueing in coded networks queue size follows degrees of freedom Jay Kumar Sundararajan, Devavrat Shah, Muriel Médard Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Linear Methods (Math 211) - Lecture 2

Linear Methods (Math 211) - Lecture 2 Linear Methods (Math 211) - Lecture 2 David Roe September 11, 2013 Recall Last time: Linear Systems Matrices Geometric Perspective Parametric Form Today 1 Row Echelon Form 2 Rank 3 Gaussian Elimination

More information

New attacks on Keccak-224 and Keccak-256

New attacks on Keccak-224 and Keccak-256 New attacks on Keccak-224 and Keccak-256 Itai Dinur 1, Orr Dunkelman 1,2 and Adi Shamir 1 1 Computer Science department, The Weizmann Institute, Rehovot, Israel 2 Computer Science Department, University

More information

CMPS 2200 Fall Divide-and-Conquer. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

CMPS 2200 Fall Divide-and-Conquer. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk CMPS 2200 Fall 2017 Divide-and-Conquer Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 1 The divide-and-conquer design paradigm 1. Divide the problem (instance)

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length

More information

The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers

The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers ANSHUL GUPTA and FRED G. GUSTAVSON IBM T. J. Watson Research Center MAHESH JOSHI

More information

Algorithms (II) Yu Yu. Shanghai Jiaotong University

Algorithms (II) Yu Yu. Shanghai Jiaotong University Algorithms (II) Yu Yu Shanghai Jiaotong University Chapter 1. Algorithms with Numbers Two seemingly similar problems Factoring: Given a number N, express it as a product of its prime factors. Primality:

More information

B. Cyclic Codes. Primitive polynomials are the generator polynomials of cyclic codes.

B. Cyclic Codes. Primitive polynomials are the generator polynomials of cyclic codes. B. Cyclic Codes A cyclic code is a linear block code with the further property that a shift of a codeword results in another codeword. These are based on polynomials whose elements are coefficients from

More information

B-Spline Interpolation on Lattices

B-Spline Interpolation on Lattices B-Spline Interpolation on Lattices David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License.

More information

CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication

CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform polynomial multiplication

More information

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform

More information

CSCI Honor seminar in algorithms Homework 2 Solution

CSCI Honor seminar in algorithms Homework 2 Solution CSCI 493.55 Honor seminar in algorithms Homework 2 Solution Saad Mneimneh Visiting Professor Hunter College of CUNY Problem 1: Rabin-Karp string matching Consider a binary string s of length n and another

More information

CSE 421 Algorithms: Divide and Conquer

CSE 421 Algorithms: Divide and Conquer CSE 42 Algorithms: Divide and Conquer Larry Ruzzo Thanks to Richard Anderson, Paul Beame, Kevin Wayne for some slides Outline: General Idea algorithm design paradigms: divide and conquer Review of Merge

More information

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015 CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Catie Baker Spring 2015 Today Registration should be done. Homework 1 due 11:59pm next Wednesday, April 8 th. Review math

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information