Pipelining and Parallel Processing

Similar documents
Pipelining and Parallel Processing

Pipelining and Parallel Processing

VLSI Signal Processing

PIPELINING AND PARALLEL PROCESSING. UNIT 4 Real time Signal Processing

DSP Design Lecture 5. Dr. Fredrik Edman.

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata

Retiming. delay elements in a circuit without affecting the input/output characteristics of the circuit.

Issues on Timing and Clocking

Chapter 8. Low-Power VLSI Design Methodology

L15: Custom and ASIC VLSI Integration

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms

Chaper 4: Retiming (Tái định thì) GV: Hoàng Trang

Chapter 4. Sequential Logic Circuits

Lecture 19 IIR Filters

Sequential Logic Circuits

Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES

EE141. Lecture 28 Multipliers. Lecture #20. Project Phase 2 Posted. Sign up for one of three project goals today

EE141- Spring 2004 Digital Integrated Circuits

Lecture 7: Logic design. Combinational logic circuits

Where are we? Data Path Design

CPE100: Digital Logic Design I

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Pipelined and Parallel Recursive and Adaptive Filters

Combinational Logic Design

Datapath Component Tradeoffs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Nyquist-Rate A/D Converters

ELEC516 Digital VLSI System Design and Design Automation (spring, 2010) Assignment 4 Reference solution

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

Lecture 3 Review on Digital Logic (Part 2)

9/18/2008 GMU, ECE 680 Physical VLSI Design

EE371 - Advanced VLSI Circuit Design

GMU, ECE 680 Physical VLSI Design

Discrete-Time Systems

Parallel Multipliers. Dr. Shoab Khan

Lecture 11 FIR Filters

LECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

AREA EFFICIENT LINEAR-PHASE FIR DIGITAL FILTER STRUCTURES

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

Based on slides/material by. Topic 3-4. Combinational Logic. Outline. The CMOS Inverter: A First Glance

Fault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class

Data AND. Output AND. An Analysis of Pipeline Clocking. J. E. Smith. March 19, 1990

GMU, ECE 680 Physical VLSI Design 1

8. Design Tradeoffs x Computation Structures Part 1 Digital Circuits. Copyright 2015 MIT EECS

EECS 312: Digital Integrated Circuits Final Exam Solutions 23 April 2009

Lecture 7 - IIR Filters

DSP Configurations. responded with: thus the system function for this filter would be

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Sequential Logic. Handouts: Lecture Slides Spring /27/01. L06 Sequential Logic 1

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Area-Time Optimal Adder with Relative Placement Generator

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Lecture 10, ATIK. Data converters 3

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

Signal Flow Graphs. Roger Woods Programmable Systems Lab ECIT, Queen s University Belfast

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length

Logic Design II (17.342) Spring Lecture Outline

School of EECS Seoul National University

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

4.0 Update Algorithms For Linear Closed-Loop Systems

Full Adder Ripple Carry Adder Carry-Look-Ahead Adder Manchester Adders Carry Select Adder

EE241 - Spring 2006 Advanced Digital Integrated Circuits

ESE570 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Integrated Cicruits AND VLSI Fundamentals

Lecture 5: DC & Transient Response

Introduction to Digital Logic

Measurement of Electrical Resistance and Ohm s Law

Designing Sequential Logic Circuits

Implementation of Clock Network Based on Clock Mesh

CSE 140 Midterm 2 - Solutions Prof. Tajana Simunic Rosing Spring 2013

Chapter 5 CMOS Logic Gate Design

Errata of K Introduction to VLSI Systems: A Logic, Circuit, and System Perspective

ELEN Electronique numérique

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Computer Architecture 10. Fast Adders

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective

Clock Strategy. VLSI System Design NCKUEE-KJLEE

CSE 140 Midterm 3 version A Tajana Simunic Rosing Spring 2015

Professor Fearing EECS150/Problem Set Solution Fall 2013 Due at 10 am, Thu. Oct. 3 (homework box under stairs)

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Transistor amplifiers: Biasing and Small Signal Model

ALU A functional unit

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

A Practical Application of Wave-Pipelining Theory on a Adaptive Differential Pulse Code Modulation Coder-Decoder Design

Chapter 6 Introduction to state machines

Logic. Basic Logic Functions. Switches in series (AND) Truth Tables. Switches in Parallel (OR) Alternative view for OR

14:332:231 DIGITAL LOGIC DESIGN

Transcription:

Pipelining and Parallel Processing Lan-Da Van ( 倫 ), Ph. D. Department of omputer Science National hiao Tung University Taiwan, R.O.. Spring, 007 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/

Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-

Pipelining Reduce the critical path Introduction Increase the clock speed or sample speed Reduce power consumption Parallel processing Not reduce the critical path Not increase clock speed, but increase sample speed Reduce power consumption Lan-Da Van VLSI-DSP-3-3

A 3-tap FIR Filter Direct-form structure y( n) ax( n) + bx( n 1) + cx( n ) T T + T sample M A f sample T M 1 + T A Lan-Da Van VLSI-DSP-3-4

Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-5

Pipelining Pipelining and Parallel oncept Introduce pipelining latches along the datapath Parallel processing Duplicate the hardware T A Lan-Da Van VLSI-DSP-3-6

Pipelining FIR Filter ritical path T A +T M -->T A +T M Lan-Da Van VLSI-DSP-3-7

Drawbacks Pipelining (1/) Increase number of delay elements (registers/latches) in the critical path Increase latency lock period limitation: critical path may be between An input and a latch A latch and an output Latches An input and an output Pipelining latches can only be placed across any feed-forward cutset of the graph Lan-Da Van VLSI-DSP-3-8

Pipelining (/) utset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint. Feed-forward cutset: A cutset is called a feed-forward cutset if the data move in the forward direction on all the edges of the cutset. Lan-Da Van VLSI-DSP-3-9

Example 3..1 4 u.t. Error! u.t. Lan-Da Van VLSI-DSP-3-10

Transposition Theorem Reversing the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system. Lan-Da Van VLSI-DSP-3-11

Data-Broadcast Structure ritical path is reduced to (T M +T A ). Lan-Da Van VLSI-DSP-3-1

Fine-Gain Pipelining Let T M 10 u.t., T A u.t., and the desired clock period6 u.t. Break the MULTIPLIER into smaller units with processing time of 6 and 4 units. Lan-Da Van VLSI-DSP-3-13

Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-14

Parallel Processing Parallel processing and pipelining are dual If a computation can be pipelined, it can also be processed in parallel. onvert a single-input single-output (SISO) system to multiple-input multiple-output (MIMO) system via parallelism Lan-Da Van VLSI-DSP-3-15

Parallel Processing of 3-Tap FIR Filter (1/) y( n) ax( n) + bx( n 1) + cx( n y(3k ) y(3k y(3k + 1) + ax(3k ) ) + ax(3k ax(3k bx(3k + 1) + ) + + 1) + bx(3k ) bx(3k cx(3k + + 1) cx(3k + ) ) 1) cx(3k ) T iter 1 T L T clk sample 1 3 ( T M + T A ) Lan-Da Van VLSI-DSP-3-16

Parallel Processing of 3-Tap FIR Filter (/) Lan-Da Van VLSI-DSP-3-17

omplete Parallel Processing System ritical path has remained unchanged. But the iteration period is reduced. Lan-Da Van VLSI-DSP-3-18

S/P and P/S onverter Edge Trigger! Edge Trigger! Lan-Da Van VLSI-DSP-3-19

Why Parallel Processing? Parallel leads to duplicating many copies of hardware, and the cost increases! Why use? Answer lies in the fact that the fundamental limit to pipelining is at I/O bottlenecks, referred to as ommunication Bound, composed of I/O pad delay and the wire delay. Parallel Transmission Lan-Da Van VLSI-DSP-3-0

ombined Fine-Grain Pipelining and Parallel Processing T iter T 1 LM sample T clk 1 6 ( T M + T A ) Lan-Da Van VLSI-DSP-3-1

Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-

Underlying Low Power oncept Propagation delay T pd k(v Power consumption P total V charge 0 0 Vt ) V 0 f P Sequential filter seq total V 0 f, T seq k(v V charge 0 0 Vt ), f 1 T seq Lan-Da Van VLSI-DSP-3-3

Pipelining for Low-Power (1/) M-level pipelined system ritical path-->1/m, capacitance to be charged in a single clock cycle-->1/m If the clock frequency is maintained, the power supply can be reduced to βv 0 (0<β<1) Lan-Da Van VLSI-DSP-3-4

Pipelining for Low-Power (/) Power consumption P pip total β V 0 f β P seq Propagation delay T seq k(v V charge 0 0 Vt ), T pip charge M k( βv 0 βv V t ) 0 Let T seq T pip M ( βv V ) β ( V V ) > 0 t 0 t get β Lan-Da Van VLSI-DSP-3-5

onsider an original 3-tap FIR filter and its fine-grain pipeline version shown in the following figures. Assume T M 10 ut, T A ut, V t 0.6V, V o 5V, and M 5 A. In fine-grain pipeline filter, the multiplier is broken into parts, m1 and m with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and times that of an adder, respectively. (a) What is the supply voltage of the pipelined filter if the clock period remains unchanged? (b) What is the power consumption of the pipelined filter as a percentage of the original filter? Example 3.4.1 (1/) Lan-Da Van VLSI-DSP-3-6

Example 3.4.1 (/) Solution: Original : Fine Grain : (5β 0.6) 3.0165V β (5 36.4% 0.6) m1 6 m + > β 0.6033 or 0.039 (infeasible) V pip Ratio β charge charge M + A A A 3 A Lan-Da Van VLSI-DSP-3-7

omparison System Sequential FIR (Original) Pipelined FIR (Without reducing Vo) Pipelined FIR (With reducing Vo) Power (Ref) Ref Ref 0.364Ref lock Period (u.t.) Sample Period (u.t.) 1 ut 6 ut 1 ut 1 ut 6 ut 1 ut Thinking Again! Lan-Da Van VLSI-DSP-3-8

Parallel Processing for Low-Power L-parallel system Maintain the same sample rate, clock period is increased to LT seq This means that charge is charged in LT seq, and the power supply can be reduced to βv 0 Lan-Da Van VLSI-DSP-3-9

Parallel Processing for Low-Power Power consumption P par Propagation delay T seq LT seq T pap ( f Ltotal )( β V0 ) L β V P seq βv charge 0 charge 0, LTseq k(v0 Vt ) k( βv0 Vt ) L( βv0 Vt ) β ( V0 Vt ) > get β Lan-Da Van VLSI-DSP-3-30

Example 3.4. (1/) onsider a 4-tap FIR filter shown in Fig. 3.18(a) and its - parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume T M 8, T A 1, V t 0.45V, V o 3.3V, M 8 A (a) What is the supply voltage of the -parallel filter? (b) What is the power consumption of the -parallel filter as a percentage of the original filter? Solution: Original : Parallel: 9(3.3β 0.45) > β 0.6589 Vpar Ratio.1743V β charge charge 5β (3.3 0.6) or 0.08 43.41% M + M A + A 10 A Lan-Da Van VLSI-DSP-3-31

Example 3.4. (/) Lan-Da Van VLSI-DSP-3-3

Example 3.4.3 (1/) A more efficient structure than the previous one is depicted in Fig. 3.18(c). (a) What is the supply voltage of the efficient -parallel filter? (b) What is the power consumption of the efficient -parallel filter as a percentage of the original filter? Solution: Original: New - Parallel: 9(3.3β 0.45) β V pip Ratio 0.745 or 0.05 (infeasible).45857v P P seq charge par 55 1β (3.3 0.6) A M charge 35 + β V A V A 0 0 9 M + 4 1 f s f A s A 1 43.6% Lan-Da Van VLSI-DSP-3-33 A

Example 3.4.3 (/) Lan-Da Van VLSI-DSP-3-34

ombining Pipelining and Parallel Processing T Parallel-pipelined structure seq k(v V charge 0 0 Vt ), T pp charge M k( βv 0 βv V t ) 0 T pp LT seq ( βv0 Vt ) ( V0 Vt ) > ML β ML, V 0 5V, V t 0.6V-->β0.4, β 0.16 Lan-Da Van VLSI-DSP-3-35

onclusions Methodologies of pipelining of 3-tap FIR filter Methodologies of parallel processing for 3-tap FIR filter Methodologies of using pipelining and parallel processing for low power demonstration. Pipelining and parallel processing of recursive digital filters using look-ahead techniques are addressed in hapter 10. Lan-Da Van VLSI-DSP-3-36

Self-Test Exercises STE1: Problem 8 of hap 3 in text book. STE: Problem 9 of hap 3 in text book. STE: Problem 10 of hap 3 in text book. Lan-Da Van VLSI-DSP-3-37