CSE4: Components and Design Techniques for Digital Systems Register Transfer Level (RTL) Design based on Vahid chap. 5 Tajana Simunic Rosing
RTL Design Method
RTL Design example: Laser-Based Distance Measurer T (in seconds) laser sensor D 2D = T sec * 3* m/sec Object of interest Laser-based distance measurement pulse laser, measure time T to sense reflection Laser light travels at speed of light, 3* m/sec Distance is thus D = T sec * 3* m/sec / 2
Step 4: Deriving the Controller s FSM B f r om but t on C o n t r oller D r eg_clr L t o laser f r om sensor S Inputs: B, S ( bit each) Outputs: L (bit), D (6 bits) Local Registers: Dctr (6 bits) D r eg_ld B S D c tr_clr D a tap a th t o displ a y D 6 D c tr_c n t 3 H M z Clock FSM has same structure as highlevel state machine Inputs/outputs all bits now Replace data operations by bit operations using datapath S S S2 S3 B S L = Dctr = L = L= D = Dctr = Dctr + Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B S B S S S2 S3 L = L = Dreg_clr = L = Dreg_clr = L = Dreg_ld = Dreg_ld = Dctr_clr = Dctr_clr = Dctr_cnt = Dctr_cnt = (clear count) (laser on) Dreg_clr = Dreg_ld = Dctr_clr = Dctr_cnt = (laser off) (clear D reg) Dreg_clr = Dreg_ld = Dctr_clr = Dctr_cnt = (laser off) (count up) S S4 S4 D = Dctr / 2 (calculate D) L = Dreg_clr = Dreg_ld = Dctr_clr = Dctr_cnt = (load D reg with Dctr/2) (stop counting)
RTL Design Method Example Soda dispenser c: bit input, when coin deposited a: -bit input having value of deposited coin s: -bit input having cost of a soda d: bit output, processor sets to when total value of deposited coins equals or exceeds cost of a soda c d 5 s a Soda dispenser processor 25 25 tot: 5 25
Step : Capture High-Level State Machine Declare local register tot Init state: Set d=, tot= Wait state: wait for coin If see coin, go to Add state Add state: Update total value: tot = tot + a Remember, a is present coin s value Go back to Wait state In Wait state, if tot >= s, go to Disp(ense) state Disp state: Set d= (dispense soda) Return to Init state Not an FSM because: Multi-bit (data) inputs a and s Local register tot Data operations tot=, tot<s, tot=tot+a. Useful high-level state machine: Data types beyond just bits Local registers Arithmetic equations/expressions Inputs: c (bit), a ( bits), s ( bits) Outputs: d (bit) Local registers: tot ( bits) I nit d= tot= c d Wait c s a Soda dispenser processor Add c *(tot<s) c *(tot<s) tot=tot+a Disp d=
Step 2: Create Datapath I nputs : c (bit), a( bits), s ( bits) O utputs : d (bit) L ocal r e g is t ers : t ot ( bits) Need tot register Need -bit comparator to compare s and a Need -bit adder to perform tot = tot + a Connect everything Create control input/outputs tot_ld tot_clr tot_lt_s s ld clr Datapath -bit < tot -bit adder a I nit d= t ot= W ait c ( t ot<s) c A dd t ot= t ot+a c *( t ot<s) Disp d=
Step 3: Connect Datapath to a Controller s a Controller s inputs External input c (coin detected) Input from datapath comparator s c output, which we named tot_lt_s Controller s outputs External output d (dispense soda) Outputs to datapath to load and clear the tot register d Controller tot_ld tot_clr tot_lt_s s a Datapath t ot_ld t ot_clr t ot_lt_s D a tap a th ld clr -bit < t ot -bit adder
Controller Datapath Same states and arcs as high-level state machine But set/read datapath control signals for all datapath operations and conditions Step 4 Derive the Controller s FSM c d Inputs: : c, tot_lt_s (bit) Outputs: d, tot_ld, tot_clr (bit) I nit d= tot_clr= Controller W ait c d c Add tot_ld= c * tot_lt_s Disp d= tot_ld tot_clr tot_lt_s tot_ld s a tot_clr tot_lt_s tot_ld tot_clr tot_lt_s s ld clr tpt -bit < Datapath -bit adder a
Disp Add Wait Init Completing the Design Implement the FSM as a state register and logic s s c tot_lt_s n n d tot_ld tot_clr c d Inputs: : c, tot_lt_s (bit) Outputs: d, tot_ld, tot_clr (bit) I nit d= tot_clr= W ait c Add tot_ld= c * tot_lt_s Disp tot_ld tot_clr tot_lt_s Controller d=
RTL Design Example: Bus Interface Example: Bus interface Master processor can read register from any peripheral Each register has unique 4-bit address Assume register/periph. Sets rd=, A=address Appropriate peripheral places register data on 32-bit D lines Periph s address provided on Faddr inputs (maybe from DIP switches, or another register) Master processor 32 r d D 4 A Per Per Per5 to/from processor bus r d D A Bus interface Q Main part Peripheral 32 4 32 Faddr 4
Step : Create FSM Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q (32 bits) rd ((A = Faddr) and rd ) WaitMyAddress (A = Faddr) D = Z and rd Q = Q SendData D = Q rd Step : Create high-level state machine State WaitMyAddress Output nothing ( Z ) on D, store peripheral s register value Q into local register Q Wait until this peripheral s address is seen (A=Faddr) and rd= State SendData Output Q onto D, wait for rd= (meaning main processor is done reading the D lines)
Step 2: Create a datapath Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q (32 bits) rd WaitMyAddress D = Z Q = Q ((A = Faddr) and rd) (A = Faddr) and rd SendData D = Q rd Q_ld A_eq_ F addr A F addr Q 4 4 32 ld Q = (4-bit) 32 Step 2: Create a datapath (a) Datapath inputs/outputs (b) Instantiate declared registers (c) Instantiate datapath components and connections D_en Bus interface Datapath 32 D
Step 3: Connect datapath to controller Step 4: Derive controller s FSM Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q (32 bits) rd Inputs: rd, A_eq_Faddr ((A = (bit) Faddr) Outputs: Q_ld, D_en and (bit) rd) rd WaitMyAddress r d SendData D = Z Q = Q W ait M y A dd r ess D_en = Q_ld = (A = Faddr) and ( A_eq_ rd F addr and r d) A_eq_ F addr and r d D = Q S endd a ta rd D_en = Q_ld = r d Q_ld A_eq_Faddr D_en A Faddr Q 4 4 32 = (4-bit) ld Q 32 32 Bus interface Datapath D
RTL Example: Video Compression Only difference: ball moving Frame Frame 2 Frame Frame 2 Digitized frame Digitized frame 2 Digitized frame Difference of 2 from Mbyte ( a ) Mbyte Mbyte Video is a series of frames (e.g., 3 per second) Most frames similar to previous frame Compression idea: just send difference from previous frame ( b ). Mbyte Just send difference
Video Compression Sum of Absolute Differences compare Frame Frame 2 Each is a pixel, assume represented as byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel) If two frames are similar just send a difference instead Compare corresponding 6x6 blocks Treat 6x6 block as 256-byte array Compute the absolute value of the difference of each array item Sum the differences if above a threshold, send a complete frame for second frame Else send the difference
Video Compression Sum of Absolute Differences 256-byte array A SAD 256-byte array B sad integer go!(i<256) Want fast sum-of-absolute-differences (SAD) component When go=, sums the differences of element pairs in arrays A and B, outputs that sum
Step : High-level FSM A B SAD sad Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) go S: wait for go S: initialize sum and index S2: check if done (i>=256) S3: add difference to sum, increment index S4: done, write to output sad_reg (i<256) S S S2 S3 S4 go!go sum = i = i<256 sum=sum+abs(a[i]-b[i]) i=i+ sad_ r eg = sum a
Step 2: Create datapath Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) (i<256) S S S2 S3!(i<256) (i_lt_256) S4 go!go sum = i = i<256 sum=sum+abs(a[i]-b[i]) i=i+ sad_ reg=sum i_lt_256 <256 i_inc i_clr sum_ld sum_clr sad_reg_ld sad_reg Datapath 32 AB_addr A_data B_data 9 i sum 32 abs 32 32 + sad
Step 3: Connect to controller Step 4: Replace high-level state machine by FSM go AB_ r d AB_addr A_data B_data? S go go sum= sum_clr= S i= i_clr= S2 i<256 i_lt_256 S3 sum=sum+abs(a[i]-b[i]) sum_ld=; AB_rd= i=i+ i_inc= S4 sad_reg=sum sad_reg_ld= Controller i_lt_256 <256 i_inc i_clr sum_ld sum_clr sad_reg_ld 9 i sum 32 abs 32 32 sad_reg 32 sad +
Data Dominated RTL Design Example: FIR Filter FIR filter Finite Impulse Response Simply a configurable weighted sum of past input values y(t) = c*x(t) + c*x(t-) + c2*x(t-2) Above known as 3 tap Tens of taps more common Very general filter User sets the constants (c, c, c2) to define specific filter X Y 2 digital filter 2 clk y(t) = c*x(t) + c*x(t-) + c2*x(t-2) RTL design Step : Create high-level state machine there is none Go straight to step 2
Step 2: Create datapath Begin by creating chain of xt registers to hold past values of X Instantiate registers for c, c, c2 Instantiate multipliers to compute c*x values Instantiate adders Add circuitry to allow loading of particular c register CL Ca Ca C e 3 2x4 2 3-tap FIR filter X clk 2 digital filter y(t) = c*x(t) + c*x(t-) + c2*x(t-2) 2 Y Step 3 & 4: Connect to controller, Create FSM No controller X clk x(t) c x(t-) c x(t-2) c2 xt xt xt2 * * + + * yreg Y
Comparing the FIR circuit to a software implementation Circuit Adder has 2-gate delay, multiplier has 2-gate delay Longest past goes through one multiplier and two adders 2 + 2 + 2 = 24-gate delay -tap filter, would have about a 34-gate delay: multiplier and 7 adders on longest path Software -tap filter: multiplications, additions. If 2 instructions per multiplication, 2 per addition. Say -gate delay per instruction. (*2 + *2)* = 4 gate delays CL Ca Ca C X clk e 3 2x4 2 x(t) y(t) = c*x(t) + c*x(t-) + c2*x(t-2) 3-tap FIR filter x(t-) x(t-2) c c c2 xt xt xt2 * * + + * yreg Y
Critical path analysis in more complex designs s a Combinational logic d c tot_ld t ot_clr ld clr tot tot_lt_s n ( c ) n tot_lt_s -bit < -bit adder clk s s State register Datapath ( b ) ( a )
Simple data encryption/decryption device B =, set offset O = I [:3] B= e=: encrypt mode: output J = I+O B= e=; decrypt mode: get I = J - O
Reaction timer On reset (rst) reaction timer waits for sec before turning on light (len=) Measures the length of time rtime (ms) until user presses button B If reaction slower than 2sec, output slow= and rtime=2
Fast sum of 6 32-bit registers
Hot water detector Output warning when average temp over the past 4 samples exceeds a user defined value; clr disables the system Inputs (32 bit): CT current temp; WT warning temp Output : W high if hot temperature; stays on until clr pressed again
Design from C code
Summary Datapath and Control Design RTL Design Steps. Define the high level state machine 2. Create datapath 3. Connect datapath with control 4. Implement the FSM Timing analysis critical path in more complex circuits Watch out for all possible long paths (e.g. datapath to FSM, FSM control logic, datapath logic etc)