Pipelining and Parallel Processing Lan-Da Van ( 倫 ), Ph. D. Department of omputer Science National hiao Tung University Taiwan, R.O.. Spring, 007 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/
Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-
Pipelining Reduce the critical path Introduction Increase the clock speed or sample speed Reduce power consumption Parallel processing Not reduce the critical path Not increase clock speed, but increase sample speed Reduce power consumption Lan-Da Van VLSI-DSP-3-3
A 3-tap FIR Filter Direct-form structure y( n) ax( n) + bx( n 1) + cx( n ) T T + T sample M A f sample T M 1 + T A Lan-Da Van VLSI-DSP-3-4
Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-5
Pipelining Pipelining and Parallel oncept Introduce pipelining latches along the datapath Parallel processing Duplicate the hardware T A Lan-Da Van VLSI-DSP-3-6
Pipelining FIR Filter ritical path T A +T M -->T A +T M Lan-Da Van VLSI-DSP-3-7
Drawbacks Pipelining (1/) Increase number of delay elements (registers/latches) in the critical path Increase latency lock period limitation: critical path may be between An input and a latch A latch and an output Latches An input and an output Pipelining latches can only be placed across any feed-forward cutset of the graph Lan-Da Van VLSI-DSP-3-8
Pipelining (/) utset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint. Feed-forward cutset: A cutset is called a feed-forward cutset if the data move in the forward direction on all the edges of the cutset. Lan-Da Van VLSI-DSP-3-9
Example 3..1 4 u.t. Error! u.t. Lan-Da Van VLSI-DSP-3-10
Transposition Theorem Reversing the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system. Lan-Da Van VLSI-DSP-3-11
Data-Broadcast Structure ritical path is reduced to (T M +T A ). Lan-Da Van VLSI-DSP-3-1
Fine-Gain Pipelining Let T M 10 u.t., T A u.t., and the desired clock period6 u.t. Break the MULTIPLIER into smaller units with processing time of 6 and 4 units. Lan-Da Van VLSI-DSP-3-13
Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-14
Parallel Processing Parallel processing and pipelining are dual If a computation can be pipelined, it can also be processed in parallel. onvert a single-input single-output (SISO) system to multiple-input multiple-output (MIMO) system via parallelism Lan-Da Van VLSI-DSP-3-15
Parallel Processing of 3-Tap FIR Filter (1/) y( n) ax( n) + bx( n 1) + cx( n y(3k ) y(3k y(3k + 1) + ax(3k ) ) + ax(3k ax(3k bx(3k + 1) + ) + + 1) + bx(3k ) bx(3k cx(3k + + 1) cx(3k + ) ) 1) cx(3k ) T iter 1 T L T clk sample 1 3 ( T M + T A ) Lan-Da Van VLSI-DSP-3-16
Parallel Processing of 3-Tap FIR Filter (/) Lan-Da Van VLSI-DSP-3-17
omplete Parallel Processing System ritical path has remained unchanged. But the iteration period is reduced. Lan-Da Van VLSI-DSP-3-18
S/P and P/S onverter Edge Trigger! Edge Trigger! Lan-Da Van VLSI-DSP-3-19
Why Parallel Processing? Parallel leads to duplicating many copies of hardware, and the cost increases! Why use? Answer lies in the fact that the fundamental limit to pipelining is at I/O bottlenecks, referred to as ommunication Bound, composed of I/O pad delay and the wire delay. Parallel Transmission Lan-Da Van VLSI-DSP-3-0
ombined Fine-Grain Pipelining and Parallel Processing T iter T 1 LM sample T clk 1 6 ( T M + T A ) Lan-Da Van VLSI-DSP-3-1
Outlines Introduction Pipelining of FIR Digital Filter Parallel Processing Pipelining and Parallel Processing for Low Power onclusions Lan-Da Van VLSI-DSP-3-
Underlying Low Power oncept Propagation delay T pd k(v Power consumption P total V charge 0 0 Vt ) V 0 f P Sequential filter seq total V 0 f, T seq k(v V charge 0 0 Vt ), f 1 T seq Lan-Da Van VLSI-DSP-3-3
Pipelining for Low-Power (1/) M-level pipelined system ritical path-->1/m, capacitance to be charged in a single clock cycle-->1/m If the clock frequency is maintained, the power supply can be reduced to βv 0 (0<β<1) Lan-Da Van VLSI-DSP-3-4
Pipelining for Low-Power (/) Power consumption P pip total β V 0 f β P seq Propagation delay T seq k(v V charge 0 0 Vt ), T pip charge M k( βv 0 βv V t ) 0 Let T seq T pip M ( βv V ) β ( V V ) > 0 t 0 t get β Lan-Da Van VLSI-DSP-3-5
onsider an original 3-tap FIR filter and its fine-grain pipeline version shown in the following figures. Assume T M 10 ut, T A ut, V t 0.6V, V o 5V, and M 5 A. In fine-grain pipeline filter, the multiplier is broken into parts, m1 and m with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and times that of an adder, respectively. (a) What is the supply voltage of the pipelined filter if the clock period remains unchanged? (b) What is the power consumption of the pipelined filter as a percentage of the original filter? Example 3.4.1 (1/) Lan-Da Van VLSI-DSP-3-6
Example 3.4.1 (/) Solution: Original : Fine Grain : (5β 0.6) 3.0165V β (5 36.4% 0.6) m1 6 m + > β 0.6033 or 0.039 (infeasible) V pip Ratio β charge charge M + A A A 3 A Lan-Da Van VLSI-DSP-3-7
omparison System Sequential FIR (Original) Pipelined FIR (Without reducing Vo) Pipelined FIR (With reducing Vo) Power (Ref) Ref Ref 0.364Ref lock Period (u.t.) Sample Period (u.t.) 1 ut 6 ut 1 ut 1 ut 6 ut 1 ut Thinking Again! Lan-Da Van VLSI-DSP-3-8
Parallel Processing for Low-Power L-parallel system Maintain the same sample rate, clock period is increased to LT seq This means that charge is charged in LT seq, and the power supply can be reduced to βv 0 Lan-Da Van VLSI-DSP-3-9
Parallel Processing for Low-Power Power consumption P par Propagation delay T seq LT seq T pap ( f Ltotal )( β V0 ) L β V P seq βv charge 0 charge 0, LTseq k(v0 Vt ) k( βv0 Vt ) L( βv0 Vt ) β ( V0 Vt ) > get β Lan-Da Van VLSI-DSP-3-30
Example 3.4. (1/) onsider a 4-tap FIR filter shown in Fig. 3.18(a) and its - parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume T M 8, T A 1, V t 0.45V, V o 3.3V, M 8 A (a) What is the supply voltage of the -parallel filter? (b) What is the power consumption of the -parallel filter as a percentage of the original filter? Solution: Original : Parallel: 9(3.3β 0.45) > β 0.6589 Vpar Ratio.1743V β charge charge 5β (3.3 0.6) or 0.08 43.41% M + M A + A 10 A Lan-Da Van VLSI-DSP-3-31
Example 3.4. (/) Lan-Da Van VLSI-DSP-3-3
Example 3.4.3 (1/) A more efficient structure than the previous one is depicted in Fig. 3.18(c). (a) What is the supply voltage of the efficient -parallel filter? (b) What is the power consumption of the efficient -parallel filter as a percentage of the original filter? Solution: Original: New - Parallel: 9(3.3β 0.45) β V pip Ratio 0.745 or 0.05 (infeasible).45857v P P seq charge par 55 1β (3.3 0.6) A M charge 35 + β V A V A 0 0 9 M + 4 1 f s f A s A 1 43.6% Lan-Da Van VLSI-DSP-3-33 A
Example 3.4.3 (/) Lan-Da Van VLSI-DSP-3-34
ombining Pipelining and Parallel Processing T Parallel-pipelined structure seq k(v V charge 0 0 Vt ), T pp charge M k( βv 0 βv V t ) 0 T pp LT seq ( βv0 Vt ) ( V0 Vt ) > ML β ML, V 0 5V, V t 0.6V-->β0.4, β 0.16 Lan-Da Van VLSI-DSP-3-35
onclusions Methodologies of pipelining of 3-tap FIR filter Methodologies of parallel processing for 3-tap FIR filter Methodologies of using pipelining and parallel processing for low power demonstration. Pipelining and parallel processing of recursive digital filters using look-ahead techniques are addressed in hapter 10. Lan-Da Van VLSI-DSP-3-36
Self-Test Exercises STE1: Problem 8 of hap 3 in text book. STE: Problem 9 of hap 3 in text book. STE: Problem 10 of hap 3 in text book. Lan-Da Van VLSI-DSP-3-37