Parallelism in Computer Arithmetic: A Historical Perspective
|
|
- Henry Hensley
- 5 years ago
- Views:
Transcription
1 Parallelism in Computer Arithmetic: A Historical Perspective 21s 2s 199s 198s 197s 196s 195s Behrooz Parhami Aug. 218 Parallelism in Computer Arithmetic Slide 1 University of California, Santa Barbara
2 About This Presentation This slide show was first developed for an invited talk at a special session on computer arithmetic in honor of Drs. Graham Jullien and William Miller, held on Monday 8/6 at the 61st Midwest Symposium on Circuits and Systems, Windsor, Ontario, Canada, August 5-8, 218. All rights reserved for the author. 218 Behrooz Parhami Edition Released Revised Revised Revised First Aug. 218 File: Aug. 218 Parallelism in Computer Arithmetic Slide 2
3 Parallelism in Computer Arithmetic: A Historical Perspective Many early parallel processing breakthroughs emerged from the quest for faster and higher-throughput arithmetic operations. Additionally, the influence of arithmetic techniques on parallel computer performance can be seen in diverse areas such the bit-serial arithmetic units of early massively parallel SIMD computers, pipelining and pipeline chaining in vector machines, design of floating-point standards to ensure the accuracy and portability of numerically-intensive programs, and prominence of GPUs in today s top-of-the-line supercomputers. This paper contains a few representative samples of the many interactions and cross-fertilizations between computer-arithmetic and parallelcomputation communities by presenting historical perspectives, case studies of state of art and practice, and directions for further collaboration. Aug. 218 Parallelism in Computer Arithmetic Slide 3
4 My Personal Journey and Career years at UCSB We are here years since graduation My children, Aug. 218 Parallelism in Computer Arithmetic Slide 4
5 I. Introduction: What Is Parallelism? The two extreme views: - Any circuit that manipulates multiple bits at once is parallel - Must have concurrency at the level of large functional blocks My view: Parallel processing is possible at the three levels of circuits, function units, and compute nodes I will provide an example at each of the three levels: - Circuit level: Parallel-prefix adders - Function level: Recursive/divide-and-conquer multiplication - System level: Discrete Fourier transform, DFT/FFT The three levels of parallelism are not mutually exclusive and can be readily combined Aug. 218 Parallelism in Computer Arithmetic Slide 5
6 II. Circuit-Level Parallelism Adders and multipliers are our two main workhorses - In this section, I cover parallel-prefix adders - Recursive multiplication is covered in Section III although it has circuit-level embodiments as well Parallel-prefix computation - Given the inputs x, x 1, x 2, x 3,, x k 1 - And an associative binary operator - Compute all the prefixes of the expression x x 1 x 2 x 3 x k 1 Example: Indexing via prefix sums Aug. 218 Parallelism in Computer Arithmetic Slide 6
7 Share-Nothing vs. Share-Everything Carry Networks Challenge: Find circuit sharing schemes that come close to A in speed and to B in cost... x 3 y 3 x 2 y 2 x 1 y 1 x y A c in c k g i p i g k 1 c k 1 Carry is: annihilated or killed propagated generated (impossible) c k 2 B g k 2 p k 2 g i+1 p i+1 p k 1 x i g i Carry network c i+1 c i s i y i p i c 2 c 1 Aug. 218 Parallelism in Computer Arithmetic Slide 7 s 3 g 1 p 1 g p g 1 p 1 g p c 1 c c c s 2 s 1 s A: Full lookahead. Each carry, and thus sum bit is computed independently and in parallel B: Ripple-carry. Each carry circuit shares the entire circuit of the previous carry
8 The Carry Operator and Block-Propagate/Generate Block B' Block B" j i j 1 i 1 g p g p (g", p") (g', p') g" p" g' p' g p Block B (g, p) g = g" + g'p" p = p'p" Parallel-prefix carries Denote (g i, p i ) by x i g p (g [,2], p [,2] ) x x 1 x 2 x 3 x k 1 c 3 Aug. 218 Parallelism in Computer Arithmetic Slide 8
9 The Brent-Kung Carry Network [7, 7] [6, 6] [5, 5] [4, 4] [3, 3] [2, 2] [1, 1] [, ] g [1,1] p [1,1] g [,] p [,] [6, 7] [2, 3] [4, 5] [, 1] [4, 7] [, 3] g [,1] p [,1] [, 7] [, 6] [, 5] [, 4] [, 3] [, 2] [, 1] [, ] Aug. 218 Parallelism in Computer Arithmetic Slide 9
10 Design Alternatives and Tradeoffs x 9 x 1 x 11 x 12 x 13 x 14 x 15 x 8 x 7 x x 6 5 x 4 x 3 x x 2 1 x x 9 x 1 x 11 x 12 x 13 x 14 x 15 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 x Level 1 Brent-Kung: 6 levels 26 cells Kogge-Stone: 4 levels 49 cells 5 6 s 9 s 1 s 11 s 12 s 13 s 14 s 15 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s s 9 s 1 s 11 s 12 s 13 s 14 s 15 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s x 15 x 14 x 13 x 12 x 11 x 1 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 x Nearly an infinite number of hybrid designs are possible Brent- Kung Kogge- Stone Brent- Kung Hybrid: 5 levels 32 cells s 15 s 14 s 13 s 12 s 11 s 1 s 9 s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s Aug. 218 Parallelism in Computer Arithmetic Slide 1
11 A Taxonomy of Parallel-Prefix Adders Fanout = 2 f + 1 Logic levels = log 2 k + l From: Harris, David, 23 Wire tracks = 2 t Aug. 218 Parallelism in Computer Arithmetic Slide 11
12 III. Function-Level Parallelism Multiplication is now just as essential as addition - In this section, I cover divide-and-conquer multiplication - Many other multiplication schemes exist several of them have parallel-processing connections Recursive multiplication xy = (2 k/2 x H + x L )(2 k/2 y H + y L ) = 2 k x H y H + 2 k/2 (x H y L + x L y H ) + x L y L = 2 k p k/2 (p 3 + p 2 ) + p 1 Complexity analysis: T(k) = 4T(k/2) + (log k) = (k 2 ) A(k) = A(k/2) + (k) = (k) p 4 p 2 p 3 x y p 1 p Aug. 218 Parallelism in Computer Arithmetic Slide 12
13 Analysis of Recursive Multiplication Recursive multiplication xy = (2 k/2 x H + x L )(2 k/2 y H + y L ) = 2 k x H y H + 2 k/2 (x H y L + x L y H ) + x L y L = 2 k p k/2 (p 3 + p 2 ) + p 1 Complexity analysis (serial): T(k) = 4T(k/2) + (log k) = (k 2 ) A(k) = A(k/2) + (k) = (k) p 4 p 2 p 3 x y p 1 p Complexity analysis (parallel): T(k) = T(k/2) + (log k) = (log k) A(k) = 4A(k/2) + (k) = (k 2 ) Theoretical lower bounds: AT = W(k 3/2 ) AT 2 = W(k 2 ) Aug. 218 Parallelism in Computer Arithmetic Slide 13
14 The Trick Proposed by Karatsuba and Ofman Recursive multiplication xy = 2 k p k/2 (p 3 + p 2 ) + p 1 Compute the auxiliary term p 5 = (x H x L )(y H y L ) = p 4 + p 1 p 3 p 2 p 3 + p 2 = p 4 + p 1 p 5 Complexity analysis: A(k) = 3A(k/2) + (k) = (k ) p 4 p 2 p = log 2 3 x y p 1 p The benefit is significant for extremely wide operands (4/3) 5 = 4.2 (4/3) 1 = 17.8 (4/3) 2 = (4/3) 5 = 1,765,781 Aug. 218 Parallelism in Computer Arithmetic Slide 14
15 Improvements to Karatsuba-Ofman Algorithm Original / Naive (k 2 ) Karatsuba-Ofman (k ) Toom / Cook (k ), (k 1.44 ) (k 1+e ) Schonhage-Strassen (k logk loglogk) Furer Still faster Is (k log k) feasible? Aug. 218 Parallelism in Computer Arithmetic Slide 15
16 Similar Trick Used for Matrix Multiplication Strassen s trick: Eight matrix multiplications reduced to 7 Original / Naive (n 3 ) Strassen (k 2.87 ) 2.87 = log 2 7 Aug. 218 Parallelism in Computer Arithmetic Slide 16
17 Strassen Matrix Multiplication in Practice Practical implementations in C++ (your results may vary) Time (s) Advantages of Strassen s algorithm show up for n ~ 3 Naive O(n 3 ) Strassen s method does not show as much improvement as Karatsuba-Ofman s because: - Its branching reduction factor is 7/8 instead of 3/4 - Matrix addition is relatively more complex than integer add Matrix size (n) Aug. 218 Parallelism in Computer Arithmetic Slide 17
18 IV. System-Level Parallelism Multiple independent or interacting arithmetic streams: - Early examples included using one or more co-processors - Modern embodiments entail the use of GPUs and the like Streamlined arithmetic blocks: No extra features Discrete Fourier Transform (DFT / FFT) Inputs x, x 1,..., x n 1 Outputs y, y 1,..., y n 1 y i = j=:n 1 n ij x j n is a primitive nth root of unity Naive method (n 2 ) x x 1 x 2... DFT y y 1 y 2... Inv. DFT x x 1 x 2... FFT (n log n) x n 1 y n 1 x n 1 Aug. 218 Parallelism in Computer Arithmetic Slide 18
19 FFT Can Be Performed in Many Different Ways Quote from The Principles of Computer Hardware: At least one good reason for studying multiplication and division is that there is an infinite number of ways of performing these operations and hence there is an infinite number of PhDs (or expenses-paid visits to conferences in the USA) to be won from inventing new forms of multiplier. ~ Alan Clemens, 1985 The statement above is even more true for DFT / FFT! Google search for FFT yields 28M+ hits The 1965 paper by Cooley and Tukey has 14K+ citations Many books on FFT have 1s to 1s of citations New ways of performing FFT are still being discovered Aug. 218 Parallelism in Computer Arithmetic Slide 19
20 Computation Scheme for 16-Point FFT Bit-reversal permutation Butterfly operation a b j a + b j a b j n log n butterfly processors, each performing one operation Pipelining improves hardware utilization Aug. 218 Parallelism in Computer Arithmetic Slide 2
21 Butterfly and Shuffle-Exchange Networks x u y x u y x u y x 4 u 1 y 1 x 1 u 2 y 4 x 1 u 2 y 4 x 2 u 2 y 2 x 2 u 1 y 2 x 2 u 1 y 2 x 6 u 3 y 3 x 3 u 3 y 6 x 3 u 3 y 6 x 1 v y 4 x 4 v y 1 x 4 v y 1 x 5 v 1 y 5 x 5 v 2 y 5 x 5 v 2 y 5 x 3 v 2 y 6 x 6 v 1 y 3 x 6 v 1 y 3 x 7 v 3 y 7 x 7 v 3 y 7 x 7 v 3 y 7 Rearrangement of nodes makes inter-column connections identical Shuffle and shuffle-exchange link pairs replaced by separate shuffle and exchange links Aug. 218 Parallelism in Computer Arithmetic Slide 21
22 Projections to Reduce Hardware Complexity n log n cost log n time n cost log n time Horizontal projection: Reduces hardware complexity by a factor log n, without increasing the asymptotic time complexity log n cost n time Vertical projection: Reduces hardware complexity by a factor n, while increasing the asymptotic time complexity by n / log n Aug. 218 Parallelism in Computer Arithmetic Slide 22
23 Timing of Computations in Low-Cost FFT Circuit Butterfly processor The feedback connections Scheduling of computations to perform an n-point FFT in O(n) time with O(log n) processors Aug. 218 Parallelism in Computer Arithmetic Slide 23
24 V. Conclusion: Where Are We, Where Next? I reviewed only 3 examples, but there are more - Parallel-prefix adders - Recursive/divide-and-conquer multipliers - Discrete Fourier Transform, DFT / FFT - Key role of GPUs in building exascale computers - High-precision, error-free, and wide-range arithmetic Path to further connections / interactions - Study cross-citation patterns between the two fields - Redundancy for data preservation and fault tolerance - New/emerging technologies: QCA, SET, Nanomagnets, - Program portability via standardization - Speculative execution of multiple program paths Aug. 218 Parallelism in Computer Arithmetic Slide 24
25 Questions or Comments?
ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders
ECE 645: Lecture 3 Conditional-Sum Adders and Parallel Prefix Network Adders FPGA Optimized Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 7.4, Conditional-Sum
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationPart II Addition / Subtraction
Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations
More informationLecture 4. Adders. Computer Systems Laboratory Stanford University
Lecture 4 Adders Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2006 Mark Horowitz Some figures from High-Performance Microprocessor Design IEEE 1 Overview Readings Today
More informationLecture 11: Adders. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture : dders Slides courtesy of Deming hen Slides based on the initial set from David Harris MOS VLSI Design Outline Single-bit ddition arry-ripple dder arry-skip dder arry-lookahead dder arry-select
More informationPart II Addition / Subtraction
Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations
More informationISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,
A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,
More informationSpeedy Maths. David McQuillan
Speedy Maths David McQuillan Basic Arithmetic What one needs to be able to do Addition and Subtraction Multiplication and Division Comparison For a number of order 2 n n ~ 100 is general multi precision
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationFast and Small: Multiplying Polynomials without Extra Space
Fast and Small: Multiplying Polynomials without Extra Space Daniel S. Roche Symbolic Computation Group School of Computer Science University of Waterloo CECM Day SFU, Vancouver, 24 July 2009 Preliminaries
More informationElliptic Curves Spring 2013 Lecture #3 02/12/2013
18.783 Elliptic Curves Spring 2013 Lecture #3 02/12/2013 3.1 Arithmetic in finite fields To make explicit computations with elliptic curves over finite fields, we need to know how to perform arithmetic
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 14 Divide and Conquer Fast Fourier Transform Sofya Raskhodnikova 10/7/2016 S. Raskhodnikova; based on slides by K. Wayne. 5.6 Convolution and FFT Fast Fourier Transform:
More informationHomework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm
EE241 - Spring 2010 Advanced Digital Integrated Circuits Lecture 25: Digital Arithmetic Adders Announcements Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH
More informationParallel Integer Polynomial Multiplication Changbo Chen, Svyatoslav Parallel Integer Covanov, Polynomial FarnamMultiplication
Parallel Integer Polynomial Multiplication Parallel Integer Polynomial Multiplication Changbo Chen 1 Svyatoslav Covanov 2,3 Farnam Mansouri 2 Marc Moreno Maza 2 Ning Xie 2 Yuzhen Xie 2 1 Chinese Academy
More informationECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders
ECE 645: Lecture 2 Carry-Lookahead, Carry-Select, & Hybrid Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 6, Carry-Lookahead Adders Sections 6.1-6.2.
More informationCS 140 Lecture 14 Standard Combinational Modules
CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationOptimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space
Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Jianhua Liu, Yi Zhu, Haikun Zhu, John Lillis 2, Chung-Kuan Cheng Department of Computer Science and Engineering University of
More informationPUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) Linxiao Wang. Graduate Program in Computer Science
PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) by Linxiao Wang Graduate Program in Computer Science A thesis submitted in partial fulfillment of the requirements
More informationDSP Configurations. responded with: thus the system function for this filter would be
DSP Configurations In this lecture we discuss the different physical (or software) configurations that can be used to actually realize or implement DSP functions. Recall that the general form of a DSP
More informationThe tangent FFT. D. J. Bernstein University of Illinois at Chicago
The tangent FFT D. J. Bernstein University of Illinois at Chicago Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography
More informationInteger multiplication with generalized Fermat primes
Integer multiplication with generalized Fermat primes CARAMEL Team, LORIA, University of Lorraine Supervised by: Emmanuel Thomé and Jérémie Detrey Journées nationales du Calcul Formel 2015 (Cluny) November
More informationCSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design
CSE477 VLSI Digital Circuits Fall 22 Lecture 2: Adder Design Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey s Digital Integrated Circuits, 22, J. Rabaey et al.] CSE477
More informationArea-Time Optimal Adder with Relative Placement Generator
Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is
More informationHow to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.
How to ultiply Slides by Kevin Wayne. Copyright 5 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials Complex ultiplication Complex multiplication. a + bi) c + di) = x + yi.
More informationVLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1
VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath
More informationLiterature Review on Multiplier Accumulation Unit by Using Hybrid Adder
Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder Amiya Prakash M.E. Scholar, Department of (ECE) NITTTR Chandigarh, Punjab Dr. Kanika Sharma Assistant Prof. Department of (ECE) NITTTR
More informationEFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS B. Venkata Sreecharan 1, C. Venkata Sudhakar 2 1 M.TECH (VLSI DESIGN)
More informationA High-Speed Realization of Chinese Remainder Theorem
Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching
More informationFaster arithmetic for number-theoretic transforms
University of New South Wales 7th October 2011, Macquarie University Plan for talk 1. Review number-theoretic transform (NTT) 2. Discuss typical butterfly algorithm 3. Improvements to butterfly algorithm
More informationImplementation of the DKSS Algorithm for Multiplication of Large Numbers
Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn The International Symposium on Symbolic and Algebraic Computation, July 6 9, 2015, Bath, United
More informationIntroduction to Algorithms
Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that
More informationDesign and Analysis of Algorithms
Design and Analysis of Algorithms CSE 5311 Lecture 5 Divide and Conquer: Fast Fourier Transform Junzhou Huang, Ph.D. Department of Computer Science and Engineering CSE5311 Design and Analysis of Algorithms
More informationFaster integer multiplication using short lattice vectors
Faster integer multiplication using short lattice vectors David Harvey and Joris van der Hoeven ANTS XIII, University of Wisconsin, Madison, July 2018 University of New South Wales / CNRS, École Polytechnique
More information8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS
8. Design Tradeoffs 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L08: Design Tradeoffs, Slide #1 There are a large number of implementations
More information8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS
8. Design Tradeoffs 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L08: Design Tradeoffs, Slide #1 There are a large number of implementations
More informationThe equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers
The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers Gerard MBlair The Department of Electrical Engineering The University of Edinburgh The King
More informationOverview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples
Overview rithmetic circuits Last lecture PLDs ROMs Tristates Design examples Today dders Ripple-carry Carry-lookahead Carry-select The conclusion of combinational logic!!! General-purpose building blocks
More informationTight Bounds on the Ratio of Network Diameter to Average Internode Distance. Behrooz Parhami University of California, Santa Barbara
Tight Bounds on the Ratio of Network Diameter to Average Internode Distance Behrooz Parhami University of California, Santa Barbara About This Presentation This slide show was first developed in fall of
More informationDIVIDE AND CONQUER II
DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos
More informationCPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication
CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform polynomial multiplication
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM
More informationSpace- and Time-Efficient Polynomial Multiplication
Space- and Time-Efficient Polynomial Multiplication Daniel S. Roche Symbolic Computation Group School of Computer Science University of Waterloo ISSAC 2009 Seoul, Korea 30 July 2009 Univariate Polynomial
More informationBinary addition example worked out
Binary addition example worked out Some terms are given here Exercise: what are these numbers equivalent to in decimal? The initial carry in is implicitly 0 1 1 1 0 (Carries) 1 0 1 1 (Augend) + 1 1 1 0
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More information5.6 Convolution and FFT
5.6 Convolution and FFT Fast Fourier Transform: Applications Applications. Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression,
More informationOptimization Techniques for Parallel Code 1. Parallel programming models
Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of
More informationLarge Integer Multiplication on Hypercubes. Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH
Large Integer Multiplication on Hypercubes Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH 03755 barry.fagin@dartmouth.edu Large Integer Multiplication 1 B. Fagin ABSTRACT Previous
More informationMultiplying huge integers using Fourier transforms
Fourier transforms October 25, 2007 820348901038490238478324 1739423249728934932894??? integers occurs in many fields of Computational Science: Cryptography Number theory... Traditional approaches to
More informationWhere are we? Data Path Design
Where are we? Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath Data Path Design
More informationVLSI Signal Processing
VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data
More informationChapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 5 Divide and Conquer CLRS 4.3 Slides by Kevin Wayne. Copyright 25 Pearson-Addison Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve
More informationDesign and Implementation of Carry Tree Adders using Low Power FPGAs
1 Design and Implementation of Carry Tree Adders using Low Power FPGAs Sivannarayana G 1, Raveendra babu Maddasani 2 and Padmasri Ch 3. Department of Electronics & Communication Engineering 1,2&3, Al-Ameer
More informationDesign of Arithmetic Logic Unit (ALU) using Modified QCA Adder
Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder M.S.Navya Deepthi M.Tech (VLSI), Department of ECE, BVC College of Engineering, Rajahmundry. Abstract: Quantum cellular automata (QCA) is
More informationCS/COE 1501 cs.pitt.edu/~bill/1501/ Integer Multiplication
CS/COE 1501 cs.pitt.edu/~bill/1501/ Integer Multiplication Integer multiplication Say we have 5 baskets with 8 apples in each How do we determine how many apples we have? Count them all? That would take
More informationDesign and Analysis of Algorithms
CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 4: Divide and Conquer (I) Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Divide and Conquer ( DQ ) First paradigm or framework DQ(S)
More information! Break up problem into several parts. ! Solve each part recursively. ! Combine solutions to sub-problems into overall solution.
Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer.! Break up problem into several parts.! Solve each part recursively.! Combine solutions to sub-problems into overall solution. Most common
More informationHw 6 due Thursday, Nov 3, 5pm No lab this week
EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm
More informationCopyright 2000, Kevin Wayne 1
Divide-and-Conquer Chapter 5 Divide and Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively. Combine solutions to sub-problems into overall solution. Most common
More informationVariable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design
Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design Ajay K. Verma Philip Brisk Paolo Ienne AjayKumar.Verma@epfl.ch Philip.Brisk@epfl.ch Paolo.Ienne@epfl.ch Ecole Polytechnique
More informationFast Polynomials Multiplication Using FFT
Li Chen lichen.xd at gmail.com Xidian University January 17, 2014 Outline 1 Discrete Fourier Transform (DFT) 2 Discrete Convolution 3 Fast Fourier Transform (FFT) 4 umber Theoretic Transform (TT) 5 More
More informationDESIGN OF PARITY PRESERVING LOGIC BASED FAULT TOLERANT REVERSIBLE ARITHMETIC LOGIC UNIT
International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.3, June 2013 DESIGN OF PARITY PRESERVING LOGIC BASED FAULT TOLERANT REVERSIBLE ARITHMETIC LOGIC UNIT Rakshith Saligram 1
More informationI. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 Design and Implementation of Carry Look Ahead Adder
More informationComplexity of computation in Finite Fields
Complexity of computation in Finite Fields Sergey B. Gashkov, Igor S. Sergeev Аннотация Review of some works about the complexity of implementation of arithmetic operations in finite fields by boolean
More informationAdders, subtractors comparators, multipliers and other ALU elements
CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing
More informationLatches. October 13, 2003 Latches 1
Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory
More informationHow fast can we calculate?
November 30, 2013 A touch of History The Colossus Computers developed at Bletchley Park in England during WW2 were probably the first programmable computers. Information about these machines has only been
More informationCPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication
CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform
More informationA Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )
A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,
More informationCSE 421 Algorithms: Divide and Conquer
CSE 42 Algorithms: Divide and Conquer Larry Ruzzo Thanks to Richard Anderson, Paul Beame, Kevin Wayne for some slides Outline: General Idea algorithm design paradigms: divide and conquer Review of Merge
More informationImplementation of Carry Look-Ahead in Domino Logic
Implementation of Carry Look-Ahead in Domino Logic G. Vijayakumar 1 M. Poorani Swasthika 2 S. Valarmathi 3 And A. Vidhyasekar 4 1, 2, 3 Master of Engineering (VLSI design) & 4 Asst.Prof/ Dept.of ECE Akshaya
More informationL8/9: Arithmetic Structures
L8/9: Arithmetic Structures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Rex Min Kevin Atkinson Prof. Randy Katz (Unified Microelectronics
More informationComplexity Theory. Ahto Buldas. Introduction September 10, Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach.
Introduction September 10, 2009 Complexity Theory Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach. Ahto Buldas e-mail: Ahto.Buldas@ut.ee home: http://home.cyber.ee/ahtbu phone:
More informationEECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7
EECS 427 Lecture 8: dders Readings: 11.1-11.3.3 3 EECS 427 F09 Lecture 8 1 Reminders HW3 project initial proposal: due Wednesday 10/7 You can schedule a half-hour hour appointment with me to discuss your
More informationOn Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli
On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli Behrooz Parhami Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560,
More informationThree Ways to Test Irreducibility
Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 12 Feb 2009 Outline Polynomials over finite fields Irreducibility criteria
More informationDatapath Component Tradeoffs
Datapath Component Tradeoffs Faster Adders Previously we studied the ripple-carry adder. This design isn t feasible for larger adders due to the ripple delay. ʽ There are several methods that we could
More informationRSA Implementation. Oregon State University
RSA Implementation Çetin Kaya Koç Oregon State University 1 Contents: Exponentiation heuristics Multiplication algorithms Computation of GCD and Inverse Chinese remainder algorithm Primality testing 2
More informationMidterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.
Announcements Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two. Chapter 5 1 Chapter 3: Part 3 Arithmetic Functions Iterative combinational circuits
More informationInteger multiplication and the truncated product problem
Integer multiplication and the truncated product problem David Harvey Arithmetic Geometry, Number Theory, and Computation MIT, August 2018 University of New South Wales Political update from Australia
More informationThree Ways to Test Irreducibility
Outline Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 8 Dec 2008 Polynomials over finite fields Irreducibility criteria
More informationLecture 8: Sequential Multipliers
Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication
More informationOptimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space
Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space Jianhua Liu, Yi Zhu, Haikun Zhu, Chung-Kuan Cheng Department of Computer Science and Engineering University of California, San
More informationPolynomial Multiplication over Finite Fields using Field Extensions and Interpolation
009 19th IEEE International Symposium on Computer Arithmetic Polynomial Multiplication over Finite Fields using Field Extensions and Interpolation Murat Cenk Department of Mathematics and Computer Science
More informationCMPSCI611: Three Divide-and-Conquer Examples Lecture 2
CMPSCI611: Three Divide-and-Conquer Examples Lecture 2 Last lecture we presented and analyzed Mergesort, a simple divide-and-conquer algorithm. We then stated and proved the Master Theorem, which gives
More informationCost/Performance Tradeoffs:
Cost/Performance Tradeoffs: a case study Digital Systems Architecture I. L10 - Multipliers 1 Binary Multiplication x a b n bits n bits EASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-,
More informationLecture 8. Sequential Multipliers
Lecture 8 Sequential Multipliers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 9, Basic Multiplication Scheme Chapter 10, High-Radix Multipliers Chapter
More informationNumbers and Arithmetic
Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu
More informationarxiv: v1 [cs.na] 8 Feb 2016
Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication
More informationP vs NP & Computational Complexity
P vs NP & Computational Complexity Miles Turpin MATH 89S Professor Hubert Bray P vs NP is one of the seven Clay Millennium Problems. The Clay Millenniums have been identified by the Clay Mathematics Institute
More informationOld and new algorithms for computing Bernoulli numbers
Old and new algorithms for computing Bernoulli numbers University of New South Wales 25th September 2012, University of Ballarat Bernoulli numbers Rational numbers B 0, B 1,... defined by: x e x 1 = n
More informationComputer Architecture 10. Fast Adders
Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance
More informationWhere are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan
Where are we? Data Path Design Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath
More informationFast algorithms for polynomials and matrices Part 2: polynomial multiplication
Fast algorithms for polynomials and matrices Part 2: polynomial multiplication by Grégoire Lecerf Computer Science Laboratory & CNRS École polytechnique 91128 Palaiseau Cedex France 1 Notation In this
More informationCSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )
CSE 548: Analysis of Algorithms Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2015 Coefficient Representation
More informationHardware Design I Chap. 4 Representative combinational logic
Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload
More informationHigh Performance GHASH Function for Long Messages
High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS
More informationArithmetic in Integer Rings and Prime Fields
Arithmetic in Integer Rings and Prime Fields A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA C 3 FA C 2 FA C 1 FA C 0 C 4 S 3 S 2 S 1 S 0 http://koclab.org Çetin Kaya Koç Spring 2018 1 / 71 Contents Arithmetic in Integer
More information