Tunable Floating-Point for Energy Efficient Accelerators

Size: px
Start display at page:

Download "Tunable Floating-Point for Energy Efficient Accelerators"

Transcription

1 Tunable Floating-Point for Energy Efficient Accelerators Alberto Nannarelli DTU Compute, Technical University of Denmark 25 th IEEE Symposium on Computer Arithmetic A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 1 / 22

2 Outline Motivation and Background Tunable Floating-Point Multiplication Hardware Implementation Application Example Improvements Conclusions and Future Work A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 2 / 22

3 Tunable Floating-Point (TFP) TFP is a floating-point format with adjustable significand and exponent fields bit-width Features significand m = [4, 24] bits (including hidden bit) exponent e = [5,8] bits rounding modes RTZ Round toward zero (truncation) RTN Round to the nearest RTNE Round to the nearest (tie to even) IEEE 754 roundtiestoeven mode subnormals flushed to zero TFP includes binary32 (m = 24, e = 8), binary16 (m = 11, e = 5), Google s Brain-FP (m = 8, e = 8). A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 3 / 22

4 Motivation High throughput parallel multiplication required in most computing applications Parallel multiplication is a power demanding operation A flexible unit, handling a flexible format, can increase the power efficiency Power efficiency = throughput watt Multiplication can be approximated by reducing the precision Accuracy of reduced precision can be improved by rounding A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 4 / 22

5 Example: Machine Learning Machine Learning applications, such as Deep Neural Networks (DNN), require huge number of multiplications: N w i x i Reducing the precision, when possible, saves energy. Example DNN A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 5 / 22

6 Example: Machine Learning Machine Learning applications, such as Deep Neural Networks (DNN), require huge number of multiplications: N w i x i Reducing the precision, when possible, saves energy. Example DNN Training Training dataset DNN w i error Ground Truth Backpropagation Operations in binary32 (single-precision) A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 5 / 22

7 Example: Machine Learning Machine Learning applications, such as Deep Neural Networks (DNN), require huge number of multiplications: N w i x i Reducing the precision, when possible, saves energy. Example DNN Inference Testing dataset DNN Output w i error Backpropagation Operations in binary16 (half-precision) A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 5 / 22

8 Baseline FP-Multiplier (Significand) Radix-4 array multiplier Radix-4 Recoding PPs generation Adder tree (carry-save): P S,P C M X 2m 1 m PPgen P s P c 2m 1 M Y radix 4 Recoding T R E E m OVF P2 tie flush to 0 HA m 1 P1 1 2:1 mux 0 m 1 ulp/2 m 1 3:1 mux M Z HA m 1 ulp/2 <<1 1 0 P0 mux 2m 1 2m 1 MSBs m 1 P0 LSBs m+1 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 6 / 22

9 Baseline FP-Multiplier (Significand) Radix-4 array multiplier Radix-4 Recoding PPs generation Adder tree (carry-save): P S,P C Speculative Rounding P2: 2 P < 4 P1: 1 P < 2 (norm.) P0: no rounding bit injection 2m 1 OVF M X P2 m PPgen P s P c tie flush to 0 2m 1 HA m 1 M Y radix 4 Recoding T R E E P1 1 2:1 mux 0 m 1 m ulp/2 m 1 3:1 mux M Z HA m 1 ulp/2 <<1 1 0 P0 mux 2m 1 2m 1 MSBs m 1 P0 LSBs m+1 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 6 / 22

10 Baseline FP-Multiplier (Significand) Radix-4 array multiplier Radix-4 Recoding PPs generation Adder tree (carry-save): P S,P C Speculative Rounding P2: 2 P < 4 P1: 1 P < 2 (norm.) P0: no rounding bit injection Normalization & Selection OVF overflow P 2 tie-to-even underflow (exp=0) overflow (exp= ) 2m 1 OVF M X P2 m PPgen P s P c tie flush to 0 2m 1 HA m 1 M Y radix 4 Recoding T R E E P1 1 2:1 mux 0 m 1 m ulp/2 m 1 3:1 mux M Z HA m 1 ulp/2 <<1 1 0 P0 mux 2m 1 2m 1 MSBs m 1 P0 LSBs m+1 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 6 / 22

11 Speculative Rounding 2m 1 P s P c 2m 1 HA ulp/2 HA ulp/2 <<1 P0 P2 m 1 P1 m P0 mux 2m 1 2m 1 pos. 2m-1 2m-2 2m-3... m+2 m+1 m m-1 m P2 1 b. b... b L R b b... b b ulp/2 1 P b... b b L R b... b b ulp/2 1 Sticky bit T = OR m 1 i=0 b i A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 7 / 22

12 Speculative Rounding 2m 1 P s P c 2m 1 HA ulp/2 HA ulp/2 <<1 P0 P2 m 1 P1 m P0 mux 2m 1 2m 1 pos. 2m-1 2m-2 2m-3... m+2 m+1 m m-1 m P2 1 b. b... b L R b b... b b P S ulp/2 1 P C P b... b b L R b... b b HA s ulp/2 1 HA c P... Sticky bit T = OR m 1 i=0 b i A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 7 / 22

13 Baseline FP-Multiplier (Exponent) At beginning of operation and in parallel Detect if operands are zero set integer bit in M Add exponents E 1 = E X +E Y Bias Increment exponent E 2 = E 1 +1 Select exponent on OVF Check conditions underflow (exp=0) overflow (exp= ) Set flush-to-0 Set exponent E Z E x e E y Exp. add E 1 E z e E mux E max Exp. adjust e e Int. bit det. Bias Exp incr. e ZERO OVF INFTY M x (m) M y (m) flush to 0 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 8 / 22

14 TFP Rounding Need to round in variable position according to m baseline binary32 P S P C ulp/2 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 9 / 22

15 TFP Rounding Need to round in variable position according to m TFP m = 24 P S P C RW... A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 9 / 22

16 TFP Rounding Need to round in variable position according to m TFP m = 24 TFP m = 21 P S P C RW... P S P C RW... A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 9 / 22

17 TFP Rounding Need to round in variable position according to m TFP m = 24 TFP m = 21 P S P C RW... P S P C RW... Need to introduce RW rounding word for rounding Need to select blue bits for sticky bit computation A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 9 / 22

18 TFP Multiplier (Significand) Hardware to support TFP Decoder needed to generate RW from m HA changed in FA (3:2 CSA) M X PPgen M Y P s P c radix 4 Recoding T R E E m 5 decoder 24 [47,24] <<1 RW : rounding word 3:2 CSA 3:2 CSA P0 <<1 OVF P2 P1 1 2:1 mux P047 mux MSBs 47 SHAMT shift right tie flush to :1 mux M Z 25 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

19 TFP Multiplier (Significand) Hardware to support TFP Decoder needed to generate RW from m HA changed in FA (3:2 CSA) P S P C RW... FA s FA c P M Z... OVF M X PPgen P s P c 3:2 CSA P2 tie flush to 0 M Y radix 4 Recoding T R E E <<1 3:2 CSA P1 1 2:1 mux 0 3:1 mux M Z m decoder P047 MSBs [47,24] RW : rounding word SHAMT <<1 1 mux 0 47 tie P0 shift right 25 sticky comp. A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

20 TFP Multiplier (Significand) Hardware to support TFP Decoder needed to generate RW from m HA changed in FA (3:2 CSA) P S P C RW... FA s FA c P M Z... Barrel shifter (to right) to select blue bits for sticky bit computation Modified selection multiplexer to zero LSBs of result OVF M X PPgen P s P c 3:2 CSA P2 tie flush to 0 M Y radix 4 Recoding T R E E <<1 3:2 CSA P1 1 2:1 mux 0 3:1 mux M Z m decoder P047 MSBs [47,24] RW : rounding word SHAMT <<1 1 mux 0 47 tie P0 shift right 25 sticky comp. A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

21 TFP Multiplier Rounding Modes RTNE IEEE roundtiestoeven Round to the nearest (tie to even) M X PPgen M Y P s P c radix 4 Recoding T R E E m 5 decoder 24 [47,24] <<1 RW : rounding word 3:2 CSA 3:2 CSA P0 <<1 OVF P2 P1 1 2:1 mux P047 mux MSBs 47 SHAMT shift right tie flush to :1 mux M Z 25 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

22 TFP Multiplier Rounding Modes M X M Y RTN Round to the nearest PPgen P s P c radix 4 Recoding T R E E m 5 decoder 24 [47,24] <<1 RW : rounding word 3:2 CSA 3:2 CSA P0 <<1 OVF P2 P1 1 2:1 mux P047 mux MSBs 47 SHAMT shift right tie flush to :1 mux M Z 25 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

23 TFP Multiplier Rounding Modes M X M Y PPgen radix 4 Recoding m 5 decoder RTZ Round toward zero T R E E P s P c <<1 24 [47,24] RW : rounding word (truncation) 3:2 CSA 3:2 CSA P0 <<1 OVF P2 P1 1 2:1 mux OVF mux MSBs 47 SHAMT shift right tie flush to :1 mux M Z 25 sticky comp. tie A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

24 TFP Error vs. Precision m Matrix multiplication 8 8 for m = {6,8,11,14,16,20,24} 0.1 error 0.01 RTZ RTN RTNE e-05 1e-06 1e Error RTNE Error RTN m A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

25 Hardware Implementation STM 45 nm low-power CMOS standard cells Clock period 1.0 ns 15 FO4 2-stage pipeline Post synthesis comparison Area (1) Power (2) error RTNE 15, RTN 12, RTZ 10, (1) [NAND2 equiv.], (2) [mw] at 1 GHz Trade-offs RTNE RTN RTZ Power (ratio) RTNE RTN RTZ Accuracy (ratio) Best tradeoffs for TFP multiplier RTN mode A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

26 TFP multiplier RTN Physical implementation (layout) E x 8 E y 8 Exp. add e 2 Bias tbl Bias 3 (3 MSBs) M X PPgen M Y radix 4 Recoding T R E E P s P c m 5 decoder 24 RW Clock period 1.0 ns (2-stage) Area 10,930 [NAND2 equiv.] Power dissipation 15.5 mw (1 GHz) (binary32 operands) E 1 8 E max <<1 Exp incr. 3:2 CSA 3:2 CSA E max 8 E z E mux Exp. adjust 8 ZERO OVF INFTY P2 P1 OVF MASK 1 masked 2:1 mux flush to 0 0 2:1 mux 1 M Z A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

27 TFP multiplier RTN Physical implementation (layout) E x 8 E 1 Exp. add E max E y 8 8 Bias 3 (3 MSBs) E z Exp incr. E mux Exp. adjust 8 8 e 2 Bias tbl E max ZERO OVF INFTY OVF M X PPgen :2 CSA P2 flush to 0 M Y radix 4 Recoding T R E E P s P c <<1 3:2 CSA P1 1 masked 2:1 mux 0 0 2:1 mux 1 M Z RW m 5 decoder 24 MASK Clock period 1.0 ns (2-stage) Area 10,930 [NAND2 equiv.] Power dissipation 15.5 mw (1 GHz) (binary32 operands) Power dissipation breakdown percent Mult. array 75 Spec. rounding 13 Norm. & selection 2 Exponent 3 Regs. etc. 6 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

28 binary32 multiplier RTN Physical implementation (layout) E x 8 E 1 E y 8 Exp. add 8 Bias e 2 Bias tbl Exp incr. 3 E max M X PPgen M Y radix 4 Recoding T R E E P s P c HA m 5 decoder 24 RW <<1 HA Comparison multiplier TFP vs. binary32 (RTN mode) Area (post-layout) b32 9,790 TFP 10,930 (+11%) [NAND2 equiv.] E max 8 E z E mux Exp. adjust 8 ZERO OVF INFTY P2 P1 OVF 1 2:1 mux 0 MASK flush to 0 0 2:1 mux 1 M Z Power dissipation 1 Z = X Y 2 Z = (X Y) W A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

29 Switching Activity in TFP and binary32 Multiplier 1 Z = X Y TFP binary32 s exp X : fraction s exp fraction s exp Y : fraction = s exp fraction = Z : s exp fraction s exp fraction A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

30 Switching Activity in TFP and binary32 Multiplier 1 Z = X Y TFP binary32 s exp X : fraction s exp fraction s exp Y : fraction = s exp fraction = Z : s exp fraction s exp fraction 2 Z = (X Y) W TFP binary32 s exp X : fraction s exp fraction s exp Y : fraction = s exp fraction = s exp fraction s exp fraction s exp W : fraction = s exp fraction = Z : s exp fraction s exp fraction A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

31 Power Dissipation TFP vs. binary32 Multiplier P ave under changing significand bitwidth m, e = 8 1 Z = X Y P ave 16 TFP b A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

32 Power Dissipation TFP vs. binary32 Multiplier P ave under changing significand bitwidth m, e = 8 2 Z = (X Y) W P ave 16 TFP b A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

33 Power Dissipation TFP vs. binary32 Multiplier P ave under changing significand bitwidth m, e = 8 2 Z = (X Y) W P ave 16 TFP b32 b32 10 TFP TFP more power efficient than b32 fom m < 20 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

34 Example: Neural Network Neural network with two hidden layers (depth=2) input layer hidden layer 1 hidden layer 2 output layer A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

35 Example: Neural Network Neural network with two hidden layers (depth=2) input layer hidden layer 1 hidden layer 2 output layer Training/Learning A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

36 Example: Neural Network Neural network with two hidden layers (depth=2) input layer hidden layer 1 hidden layer 2 output layer Training dataset DNN w i error Ground Truth Backpropagation Training/Learning A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

37 Example: Neural Network Neural network with two hidden layers (depth=2) input layer hidden layer 1 hidden layer 2 output layer Training/Learning binary32 binary16 m e ǫ ave epoch n op P ave E Learn , , , , , , , , [mw] [J] A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

38 Example: Neural Network Neural network with two hidden layers (depth=2) input layer hidden layer 1 hidden layer 2 output layer C simulator ǫ ave, epoch, n op test-vectors Synopsys VCS switching activity Synopsys ICC P ave E Learn = P ave n op T clk Training/Learning binary32 binary16 m e ǫ ave epoch n op P ave E Learn , , , , , , , , [mw] [J] A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

39 Example: Neural Network Inference Approximation Error (binary32) A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

40 Example: Neural Network Inference Approximation Error (binary32) A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

41 Example: Neural Network Inference Approximation Error (binary32) Ideal n = 120 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

42 Example: Neural Network Inference Approximation Error (binary32) Ideal n = 120 NN approx. n = 129 ( ) 97 Miss-classified (in) 32 (out) A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

43 Example: Neural Network Inference Quantization Error / Reduced Precision binary32 binary16 m=8, e=8 m=6, e=5 m=5, e=5 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

44 Example: Neural Network Inference Quantization Error / Reduced Precision binary32 binary16 m=8, e=8 m=6, e=5 m=5, e=5 m e Np TOT ( ) points in b32 and not in (m,e) points in (m,e) and not in b32 A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

45 M X M Y M X M Y M X M Y Improvements Use compound adder in speculative rounding Implement programmable rounding modes PPgen radix 4 Recoding m 5 PPgen radix 4 Recoding m 5 PPgen radix 4 Recoding m 5 decoder decoder decoder T R E E P s P c 24 [47,24] T R E E P s P c 24 [47,24] T R E E P s P c 24 [47,24] <<1 RW : rounding word <<1 RW : rounding word <<1 RW : rounding word 3:2 CSA 3:2 CSA P0 3:2 CSA 3:2 CSA P0 3:2 CSA 3:2 CSA P0 <<1 <<1 <<1 P2 P1 1 0 P047 mux MSBs 47 P2 P1 1 0 P047 mux MSBs 47 P2 P1 1 0 OVF mux MSBs 47 OVF 1 2:1 mux 0 SHAMT shift right OVF 1 2:1 mux 0 SHAMT shift right OVF 1 2:1 mux 0 SHAMT shift right tie flush to 0 3:1 mux sticky comp. tie 3:1 mux sticky comp. tie 3:1 mux flush to 0 flush to 0 tie tie M Z M Z M Z RTNE RTN RTZ sticky comp. tie RTNE for binary32 and binary16 only replace barrel shifter with multiplexer Split the unit in two when m < 12 two binary16 in parallel A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

46 Conclusions and Future Work Tunable Floating-Point multiplier for accelerator use correct rounding for the selected precision reduced power dissipation Future Work Currently testing a TFP-adder (2-stage) Integrating multiplication and addition in FMA Division A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

47 Additional Slides A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH-25 / 22

48 TFP-mult vs. b32-mult: Power Dissipation Average power dissipation for TFP-mult and b32-mult (RTN) P ave Z = X Y Z = (X Y) W m TFP-mult b32-mult ratio TFP-mult b32-mult ratio P ave [mw] measured at 1 GHz. A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

49 Average Power Dissipation for TFP-mult Average power dissipation for TFP-mult (RTN) as m scales (e = 8) Matrix multiplication Gauss elimination m P ave [mw] ratio P ave [mw] ratio P ave measured at 1 GHz. A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

50 TFP-mult Trends 16 P ave Matrix mult Gauss Trends of average power dissipation for TFP-mult (RTN) as m scales (e = 8) A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

51 Hardware for binary32 to binary16 Reduction /* range checking (exponent) */ E b16 = E b32 B b32 +B b16 = E b // must be positive /* check lower bound (exponent) */ E b16 E max(5b) < 0 E b = E b < 0 /* check significand for non-zero bits */ zero = 0; for i=0 to 12 do zero = zero or M b32 (i); end for c1 Eb = 112 = bit sign if ((E b16 > 0)and(E b < 0)and(zero = 0)) then reduce to binary16 else keep binary32 end if 5 E b16 8 c2 c1 c2 zero 8 bit sign binary16 M M zero... binary32 1 2:1 mux 0 OR TREE A. Nannarelli (DTU Compute) Tunable Floating-Point for Energy Efficient Accelerators ARITH / 22

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

Chapter 8: Solutions to Exercises

Chapter 8: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier Science, c 2004 with contributions by Fabrizio Lamberti Exercise 8.1 Fixed-point representation

More information

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Espen Stenersen Master of Science in Electronics Submission date: June 2008 Supervisor: Per Gunnar Kjeldsberg, IET Co-supervisor: Torstein

More information

Binary Floating-Point Numbers

Binary Floating-Point Numbers Binary Floating-Point Numbers S exponent E significand M F=(-1) s M β E Significand M pure fraction [0, 1-ulp] or [1, 2) for β=2 Normalized form significand has no leading zeros maximum # of significant

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 4 6 8 2 4 6 8 3 3 6 9 2 5 8 2 24 27 4 4 8 2 6

More information

CS 140 Lecture 14 Standard Combinational Modules

CS 140 Lecture 14 Standard Combinational Modules CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH

More information

Processor Design & ALU Design

Processor Design & ALU Design 3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of

More information

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS ARITHMETIC COMBINATIONAL MODULES AND NETWORKS 1 SPECIFICATION OF ADDER MODULES FOR POSITIVE INTEGERS HALF-ADDER AND FULL-ADDER MODULES CARRY-RIPPLE AND CARRY-LOOKAHEAD ADDER MODULES NETWORKS OF ADDER MODULES

More information

Chapter 5: Solutions to Exercises

Chapter 5: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier Science, c 2004 Updated: September 9, 2003 Chapter 5: Solutions to Selected Exercises With contributions

More information

ALU (3) - Division Algorithms

ALU (3) - Division Algorithms HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Lecture 12 ALU (3) - Division Algorithms Sommersemester 2002 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/rok/ca CA - XII - ALU(3)

More information

Lecture 18: Datapath Functional Units

Lecture 18: Datapath Functional Units Lecture 8: Datapath Functional Unit Outline Comparator Shifter Multi-input Adder Multiplier 8: Datapath Functional Unit CMOS VLSI Deign 4th Ed. 2 Comparator 0 detector: A = 00 000 detector: A = Equality

More information

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits Logic and Computer Design Fundamentals Chapter 5 Arithmetic Functions and Circuits Arithmetic functions Operate on binary vectors Use the same subfunction in each bit position Can design functional block

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

Lecture 12: Datapath Functional Units

Lecture 12: Datapath Functional Units Lecture 2: Datapath Functional Unit Slide courtey of Deming Chen Slide baed on the initial et from David Harri CMOS VLSI Deign Outline Comparator Shifter Multi-input Adder Multiplier Reading:.3-4;.8-9

More information

Lecture 11. Advanced Dividers

Lecture 11. Advanced Dividers Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

A 32-bit Decimal Floating-Point Logarithmic Converter

A 32-bit Decimal Floating-Point Logarithmic Converter A 3-bit Decimal Floating-Point Logarithmic Converter Dongdong Chen 1, Yu Zhang 1, Younhee Choi 1, Moon Ho Lee, Seok-Bum Ko 1, Department of Electrical and Computer Engineering, University of Saskatchewan

More information

Lecture 8: Sequential Multipliers

Lecture 8: Sequential Multipliers Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication

More information

Part VI Function Evaluation

Part VI Function Evaluation Part VI Function Evaluation Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Lecture 12: Datapath Functional Units

Lecture 12: Datapath Functional Units Introduction to CMOS VLSI Deign Lecture 2: Datapath Functional Unit David Harri Harvey Mudd College Spring 2004 Outline Comparator Shifter Multi-input Adder Multiplier 2: Datapath Functional Unit CMOS

More information

Chapter 4 Number Representations

Chapter 4 Number Representations Chapter 4 Number Representations SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 9, 2016 Table of Contents 1 Fundamentals 2 Signed Numbers 3 Fixed-Point

More information

DIVISION BY DIGIT RECURRENCE

DIVISION BY DIGIT RECURRENCE DIVISION BY DIGIT RECURRENCE 1 SEVERAL DIVISION METHODS: DIGIT-RECURRENCE METHOD studied in this chapter MULTIPLICATIVE METHOD (Chapter 7) VARIOUS APPROXIMATION METHODS (power series expansion), SPECIAL

More information

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego CSE 4 Lecture Standard Combinational Modules CK Cheng and Diba Mirza CSE Dept. UC San Diego Part III - Standard Combinational Modules (Harris: 2.8, 5) Signal Transport Decoder: Decode address Encoder:

More information

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California VLSI Arithmetic Lecture 9: Carry-Save and Multi-Operand Addition Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Carry-Save Addition* *from Parhami 2 June 18, 2003 Carry-Save

More information

8/13/16. Data analysis and modeling: the tools of the trade. Ø Set of numbers. Ø Binary representation of numbers. Ø Floating points.

8/13/16. Data analysis and modeling: the tools of the trade. Ø Set of numbers. Ø Binary representation of numbers. Ø Floating points. Data analysis and modeling: the tools of the trade Patrice Koehl Department of Biological Sciences National University of Singapore http://www.cs.ucdavis.edu/~koehl/teaching/bl5229 koehl@cs.ucdavis.edu

More information

Low Power Division and Square Root

Low Power Division and Square Root UNIVERSITY OF CALIFORNIA, IRVINE Low Power Division and Square Root DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Engineering by Alberto Nannarelli

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Design of Sequential Circuits

Design of Sequential Circuits Design of Sequential Circuits Seven Steps: Construct a state diagram (showing contents of flip flop and inputs with next state) Assign letter variables to each flip flop and each input and output variable

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH

More information

Arithmetic Circuits-2

Arithmetic Circuits-2 Arithmetic Circuits-2 Multipliers Array multipliers Shifters Barrel shifter Logarithmic shifter ECE 261 Krish Chakrabarty 1 Binary Multiplication M-1 X = X i 2 i i=0 Multiplicand N-1 Y = Y i 2 i i=0 Multiplier

More information

Appendix B. Review of Digital Logic. Baback Izadi Division of Engineering Programs

Appendix B. Review of Digital Logic. Baback Izadi Division of Engineering Programs Appendix B Review of Digital Logic Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Elect. & Comp. Eng. 2 DeMorgan Symbols NAND (A.B) = A +B NOR (A+B) = A.B AND A.B = A.B = (A +B ) OR

More information

Complement Arithmetic

Complement Arithmetic Complement Arithmetic Objectives In this lesson, you will learn: How additions and subtractions are performed using the complement representation, What is the Overflow condition, and How to perform arithmetic

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN

More information

On various ways to split a floating-point number

On various ways to split a floating-point number On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul Zimmermann Inria, CNRS, ENS Lyon, Université de Lyon, Université de Lorraine France ARITH-25 June 2018 -2-

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS * Copyright IEEE 999: Published in the Proceedings of Globecom 999, Rio de Janeiro, Dec 5-9, 999 A HIGH-SPEED PROCESSOR FOR RECTAGULAR-TO-POLAR COVERSIO WITH APPLICATIOS I DIGITAL COMMUICATIOS * Dengwei

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

UNIT V FINITE WORD LENGTH EFFECTS IN DIGITAL FILTERS PART A 1. Define 1 s complement form? In 1,s complement form the positive number is represented as in the sign magnitude form. To obtain the negative

More information

Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid Number System

Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid Number System Computer Networks and Communications, Article ID 890354, 7 pages http://dx.doi.org/10.1155/2014/890354 Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid

More information

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017 Neural Network Approximation Low rank, Sparsity, and Quantization zsc@megvii.com Oct. 2017 Motivation Faster Inference Faster Training Latency critical scenarios VR/AR, UGV/UAV Saves time and energy Higher

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Adders 2 Circuit Delay Transistors have instrinsic resistance and capacitance

More information

Cost/Performance Tradeoff of n-select Square Root Implementations

Cost/Performance Tradeoff of n-select Square Root Implementations Australian Computer Science Communications, Vol.22, No.4, 2, pp.9 6, IEEE Comp. Society Press Cost/Performance Tradeoff of n-select Square Root Implementations Wanming Chu and Yamin Li Computer Architecture

More information

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute DIGITAL TECHNICS Dr. Bálint Pődör Óbuda University, Microelectronics and Technology Institute 4. LECTURE: COMBINATIONAL LOGIC DESIGN: ARITHMETICS (THROUGH EXAMPLES) 2016/2017 COMBINATIONAL LOGIC DESIGN:

More information

Number Representation and Waveform Quantization

Number Representation and Waveform Quantization 1 Number Representation and Waveform Quantization 1 Introduction This lab presents two important concepts for working with digital signals. The first section discusses how numbers are stored in memory.

More information

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process Computer Architecture ESE 345 Computer Architecture Design Process 1 The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman

More information

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) * Institute for Applied Information Processing and Communications Graz University of Technology Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) * CHES 2002 Workshop on Cryptographic Hardware and Embedded

More information

Integer Multipliers 1

Integer Multipliers 1 Integer Multipliers Multipliers must have circuit in most DS applications variety of multipliers exists that can be chosen based on their performance Serial, Serial/arallel,Shift and dd, rray, ooth, Wallace

More information

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1)) The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering

More information

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

A Low-Error Statistical Fixed-Width Multiplier and Its Applications A Low-Error Statistical Fixed-Width Multiplier and Its Applications Yuan-Ho Chen 1, Chih-Wen Lu 1, Hsin-Chen Chiang, Tsin-Yuan Chang, and Chin Hsia 3 1 Department of Engineering and System Science, National

More information

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3 Digital Logic: Boolean Algebra and Gates Textbook Chapter 3 Basic Logic Gates XOR CMPE12 Summer 2009 02-2 Truth Table The most basic representation of a logic function Lists the output for all possible

More information

The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal

The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal -1- The Classical Relative Error Bounds for Computing a2 + b 2 and c/ a 2 + b 2 in Binary Floating-Point Arithmetic are Asymptotically Optimal Claude-Pierre Jeannerod Jean-Michel Muller Antoine Plet Inria,

More information

Optimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations

Optimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations electronics Article Optimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations Masoud Sadeghian 1,, James E. Stine 1, *, and E. George Walters III 2, 1 Oklahoma

More information

ECS 231 Computer Arithmetic 1 / 27

ECS 231 Computer Arithmetic 1 / 27 ECS 231 Computer Arithmetic 1 / 27 Outline 1 Floating-point numbers and representations 2 Floating-point arithmetic 3 Floating-point error analysis 4 Further reading 2 / 27 Outline 1 Floating-point numbers

More information

Lecture 2 Review on Digital Logic (Part 1)

Lecture 2 Review on Digital Logic (Part 1) Lecture 2 Review on Digital Logic (Part 1) Xuan Silvia Zhang Washington University in St. Louis http://classes.engineering.wustl.edu/ese461/ Grading Engagement 5% Review Quiz 10% Homework 10% Labs 40%

More information

Sample Test Paper - I

Sample Test Paper - I Scheme G Sample Test Paper - I Course Name : Computer Engineering Group Marks : 25 Hours: 1 Hrs. Q.1) Attempt any THREE: 09 Marks a) Define i) Propagation delay ii) Fan-in iii) Fan-out b) Convert the following:

More information

Innocuous Double Rounding of Basic Arithmetic Operations

Innocuous Double Rounding of Basic Arithmetic Operations Innocuous Double Rounding of Basic Arithmetic Operations Pierre Roux ISAE, ONERA Double rounding occurs when a floating-point value is first rounded to an intermediate precision before being rounded to

More information

ELEN Electronique numérique

ELEN Electronique numérique ELEN0040 - Electronique numérique Patricia ROUSSEAUX Année académique 2014-2015 CHAPITRE 3 Combinational Logic Circuits ELEN0040 3-4 1 Combinational Functional Blocks 1.1 Rudimentary Functions 1.2 Functions

More information

Elements of Floating-point Arithmetic

Elements of Floating-point Arithmetic Elements of Floating-point Arithmetic Sanzheng Qiao Department of Computing and Software McMaster University July, 2012 Outline 1 Floating-point Numbers Representations IEEE Floating-point Standards Underflow

More information

Efficient Function Approximation Using Truncated Multipliers and Squarers

Efficient Function Approximation Using Truncated Multipliers and Squarers Efficient Function Approximation Using Truncated Multipliers and Squarers E. George Walters III Lehigh University Bethlehem, PA, USA waltersg@ieee.org Michael J. Schulte University of Wisconsin Madison

More information

Chapter 6: Solutions to Exercises

Chapter 6: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier Science, c 00 Updated: September 3, 003 With contributions by Elisardo Antelo and Fabrizio Lamberti

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing CSE4: Components and Design Techniques for Digital Systems Decoders, adders, comparators, multipliers and other ALU elements Tajana Simunic Rosing Mux, Demux Encoder, Decoder 2 Transmission Gate: Mux/Tristate

More information

Project Two RISC Processor Implementation ECE 485

Project Two RISC Processor Implementation ECE 485 Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor

More information

Class Website:

Class Website: ECE 20B, Winter 2003 Introduction to Electrical Engineering, II LECTURE NOTES #5 Instructor: Andrew B. Kahng (lecture) Email: abk@ece.ucsd.edu Telephone: 858-822-4884 office, 858-353-0550 cell Office:

More information

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

ECE/CS 552: Introduction To Computer Architecture 1. Instructor:Mikko H Lipasti. Fall 2010 University i of Wisconsin-Madison

ECE/CS 552: Introduction To Computer Architecture 1. Instructor:Mikko H Lipasti. Fall 2010 University i of Wisconsin-Madison ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 Univsity i of Wisconsin-Madison i Lecture notes partially based on set created by Mark Hill. Basic Arithmetic and the ALU Numb representations:

More information

Design and Implementation of REA for Single Precision Floating Point Multiplier Using Reversible Logic

Design and Implementation of REA for Single Precision Floating Point Multiplier Using Reversible Logic Design and Implementation of REA for Single Precision Floating Point Multiplier Using Reversible Logic MadivalappaTalakal 1, G.Jyothi 2, K.N.Muralidhara 3, M.Z.Kurian 4 PG Student [VLSI & ES], Dept. of

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization rithmetic Logic Unit (LU) and Register File Prof. Michel. Kinsy Computing: Computer Organization The DN of Modern Computing Computer CPU Memory System LU Register File Disks

More information

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks ECE 545 Digital System Design with VHDL Lecture Digital Logic Refresher Part A Combinational Logic Building Blocks Lecture Roadmap Combinational Logic Basic Logic Review Basic Gates De Morgan s Law Combinational

More information

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap EECS150 - Digital Design Lecture 25 Shifters and Counters Nov. 21, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John

More information

NEW SELF-CHECKING BOOTH MULTIPLIERS

NEW SELF-CHECKING BOOTH MULTIPLIERS Int. J. Appl. Math. Comput. Sci., 2008, Vol. 18, No. 3, 319 328 DOI: 10.2478/v10006-008-0029-4 NEW SELF-CHECKING BOOTH MULTIPLIERS MARC HUNGER, DANIEL MARIENFELD Department of Electrical Engineering and

More information

An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits

An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits Jun Shiomi 1, Tohru Ishihara 1, Hidetoshi Onodera 1, Akihiko Shinya 2, Masaya Notomi 2 1 Graduate

More information

ECE 545 Digital System Design with VHDL Lecture 1A. Digital Logic Refresher Part A Combinational Logic Building Blocks

ECE 545 Digital System Design with VHDL Lecture 1A. Digital Logic Refresher Part A Combinational Logic Building Blocks ECE 545 Digital System Design with VHDL Lecture A Digital Logic Refresher Part A Combinational Logic Building Blocks Lecture Roadmap Combinational Logic Basic Logic Review Basic Gates De Morgan s Laws

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. memory inst 32 register

More information

Tree and Array Multipliers Ivor Page 1

Tree and Array Multipliers Ivor Page 1 Tree and Array Multipliers 1 Tree and Array Multipliers Ivor Page 1 11.1 Tree Multipliers In Figure 1 seven input operands are combined by a tree of CSAs. The final level of the tree is a carry-completion

More information

Lecture 8. Sequential Multipliers

Lecture 8. Sequential Multipliers Lecture 8 Sequential Multipliers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 9, Basic Multiplication Scheme Chapter 10, High-Radix Multipliers Chapter

More information

Fundamentals of Digital Design

Fundamentals of Digital Design Fundamentals of Digital Design Digital Radiation Measurement and Spectroscopy NE/RHP 537 1 Binary Number System The binary numeral system, or base-2 number system, is a numeral system that represents numeric

More information

The Design Procedure. Output Equation Determination - Derive output equations from the state table

The Design Procedure. Output Equation Determination - Derive output equations from the state table The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types

More information

Written exam with solutions IE Digital Design Friday 21/

Written exam with solutions IE Digital Design Friday 21/ Written exam with solutions IE204-5 Digital Design Friday 2/0 206 09.00-3.00 General Information Examiner: Ingo Sander. Teacher: Kista, William Sandvist tel 08-7904487, Elena Dubrova phone 08-790 4 4 Exam

More information

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman. SP esign Lecture 7 Unfolding cont. & Folding r. Fredrik Edman fredrik.edman@eit.lth.se Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way

More information

How to Prepare Weather and Climate Models for Future HPC Hardware

How to Prepare Weather and Climate Models for Future HPC Hardware How to Prepare Weather and Climate Models for Future HPC Hardware Peter Düben European Weather Centre (ECMWF) Peter Düben Page 2 The European Weather Centre (ECMWF) www.ecmwf.int Independent, intergovernmental

More information

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1 Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language

More information

DIGIT-SERIAL ARITHMETIC

DIGIT-SERIAL ARITHMETIC DIGIT-SERIAL ARITHMETIC 1 Modes of operation:lsdf and MSDF Algorithm and implementation models LSDF arithmetic MSDF: Online arithmetic TIMING PARAMETERS 2 radix-r number system: conventional and redundant

More information

Combinational Logic Design Arithmetic Functions and Circuits

Combinational Logic Design Arithmetic Functions and Circuits Combinational Logic Design Arithmetic Functions and Circuits Overview Binary Addition Half Adder Full Adder Ripple Carry Adder Carry Look-ahead Adder Binary Subtraction Binary Subtractor Binary Adder-Subtractor

More information

Notes for Chapter 1 of. Scientific Computing with Case Studies

Notes for Chapter 1 of. Scientific Computing with Case Studies Notes for Chapter 1 of Scientific Computing with Case Studies Dianne P. O Leary SIAM Press, 2008 Mathematical modeling Computer arithmetic Errors 1999-2008 Dianne P. O'Leary 1 Arithmetic and Error What

More information

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

LOGIC CIRCUITS. Basic Experiment and Design of Electronics Basic Experiment and Design of Electronics LOGIC CIRCUITS Ho Kyung Kim, Ph.D. hokyung@pusan.ac.kr School of Mechanical Engineering Pusan National University Outline Combinational logic circuits Output

More information

Lecture 2: Number Representations (2)

Lecture 2: Number Representations (2) Lecture 2: Number Representations (2) ECE 645 Computer Arithmetic 1/29/08 ECE 645 Computer Arithmetic Lecture Roadmap Number systems (cont'd) Floating point number system representations Residue number

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L15 Blocks (1) inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #15: Combinational Logic Blocks Outline CL Blocks Latches & Flip Flops A Closer Look 2005-07-14 Andy Carle CS

More information

Multiplication of signed-operands

Multiplication of signed-operands Multiplication of signed-operands Recall we discussed multiplication of unsigned numbers: Combinatorial array multiplier. Sequential multiplier. Need an approach that works uniformly with unsigned and

More information

Nyquist-Rate A/D Converters

Nyquist-Rate A/D Converters IsLab Analog Integrated ircuit Design AD-51 Nyquist-ate A/D onverters כ Kyungpook National University IsLab Analog Integrated ircuit Design AD-1 Nyquist-ate MOS A/D onverters Nyquist-rate : oversampling

More information